Watch all RedisConf 2021 sessions on demand
A single instance (or shard) of Redis is very capable, but there are circumstances where you may want an index to span across multiple instances. For example, to increase throughput by parallelization or indexes that exceed the size of the instance. Say you want to perform an operation over a number of keys. An effective way to partition the keys is to provide a uniformly distributed slice of the keys to each partition, do whatever operations you need to do on each partition in parallel, then merge the results at the end.
To achieve a uniform distribution of keys in an ideal way, we’ll employ a non-cryptographic hashing algorithm. Any fast hashing function will do, although we’ll just use the familiar CRC-32 for this example. Most of the time humans encounter the output of hashing algorithm in hexadecimal (“my-cool-document” would yield a CRC-32 output of F9FDB2C9 as an example). The hexadecimal representation is just easier for humans, but it is merely an alternate representation of a decimal integer value, which means we can do calculations on this value.
Next we need to determine the number of partitions—this is should be at least double the number of instances. This allows for growth; if you need to add instances then you can move one partition to another instance. So, let’s say we have 3 instances and thus 6 partitions.
To calculate the the partition which the a particular document will live on, we’ll do the following operation:
HashFunction(document_id) mod number_of_partitions
If we’re “my-cool-document” with 6 partitions then it would work like this:
CRC32(“my-cool-document”) = F9FDB2C9 (hex) or 4194153161 (decimal)
4194153161 mod 6 = 5
In Redis Enterprise, you can control which shard a key goes to by using either a pre-defined regular expression or surrounding a part of the key with curly braces. So, for our example, we can set a key for this document to look like this:
Then we have another document, “my-other-document”, which yields a partition number of 3, so the key would looks like this:
If you have additional ancillary keys that you you’ll need to operate on that tied to this document you’ll want have them reside on the same shard so you can perform operations on both keys at the same time without encountering cross-lot errors. To do this you’ll need to add the same partition number as the actual document to the key.
Zooming out and inspecting your data you’ll see that your index now is pretty evenly divided among your partitions. You can parallelize the work that needs to be accomplished across each partition. When you have work that needs to be done across the entire index, your client will need to execute the same logic on each partition, get the results back, and do any merging that is required in the client.