9.2 Sharded structures
Sharding is a well-known technique that has been used to help many different databases
scale to larger data storage and processing loads. Basically, sharding takes your data,
partitions it into smaller pieces based on some simple rules, and then sends the data to
different locations depending on which partition the data had been assigned to.
In this section, we’ll talk about applying the concept of sharding to HASHes, SETs,
and ZSETs to support a subset of their standard functionality, while still letting us use
the small structures from section 9.1 to reduce memory use. Generally, instead of storing
value X in key Y, we’ll store X in key Y:<shardid>.
SHARDING LISTSSharding LISTs without the use of Lua scripting is difficult,
which is why we omit it here. When we introduce scripting with Lua in chapter
11, we’ll build a sharded LIST implementation that supports blocking and
nonblocking pushes and pops from both ends.
SHARDING ZSETSUnlike sharded HASHes and SETs, where essentially all operations
can be supported with a moderate amount of work (or even LISTs with
Lua scripting), commands like ZRANGE, ZRANGEBYSCORE, ZRANK, ZCOUNT, ZREMRANGE,
ZREMRANGEBYSCORE, and more require operating on all of the shards of
a ZSET to calculate their final result. Because these operations on sharded
ZSETs violate almost all of the expectations about how quickly a ZSET should
perform with those operations, sharding a ZSET isn’t necessarily that useful,
which is why we essentially omit it here.
If you need to keep full information for a large ZSET, but you only really perform
queries against the top- or bottom-scoring X, you can shard your ZSET in
the same way we shard HASHes in section 9.2.1: keeping auxiliary top/bottom
scoring ZSETs, which you can update with ZADD/ZREMRANGEBYRANK to keep
limited (as we’ve done previously in chapters 2 and 4–8).
You could also use sharded ZSETs as a way of reducing single-command latencies
if you have large search indexes, though discovering the final highestand
lowest-scoring items would take a potentially long series of ZUNIONSTORE/
When sharding structures, we can make a decision to either support all of the functionality
of a single structure or only a subset of the standard functionality. For the
sake of simplicity, when we shard structures in this book, we’ll only implement a subset
of the functionality offered by the standard structures, because to implement the full
functionality can be overwhelming (from both computational and code-volume perspectives).
Even though we only implement a subset of the functionality, we’ll use
these sharded structures to offer memory reductions to existing problems, or to solve
new problems more efficiently than would otherwise be possible.
The first structure we’ll talk about sharding is the HASH.