After we have our packed location codes, we only need to store them in STRINGs with
SETRANGE. But before we do so, we have to think for a moment about how many users
we’re going to be storing information about. For example, suppose that Twitter has 750
million users today (based on the observation that recently created users have IDs
greater than 750 million); we’d need over 1.5 gigabytes of space to store location information
for all Twitter users. Though most operating systems are able to reasonably allocate
large regions of memory, Redis limits us to 512 megabyte STRINGs, and due to
Redis’s clearing out of data when setting a value beyond the end of an existing STRING,
setting the first value at the end of a long STRING will take more time than would be
expected for a simple SETBIT call. Instead, we can use a technique similar to what we
used in section 9.2.1, and shard our data across a collection of STRINGs.
Unlike when we were sharding HASHes and SETs, we don’t have to worry about
being efficient by keeping our shards smaller than a few thousand elements, since we
can access an element directly without decoding any others. Similarly, we can write to
a given offset efficiently. Instead, our concern is more along the lines of being efficient
at a larger scale—specifically what will balance potential memory fragmentation,
as well as minimize the number of keys that are necessary. For this example, we’ll store
location information for 220 users (just over 1 million entries) per STRING, which will
use about 2 megabytes per STRING. In the next listing, we see the code for updating
location information for a user.
For the most part, there shouldn’t be anything surprising there. We calculate the location
code to store for a user, calculate the shard and the individual shard offset for the
user, and then store the location code in the proper location for the user. The only thing that’s strange and may not seem necessary is that we also update a ZSET that stores the highest-numbered user ID that has been seen. This is primarily important when calculating
aggregates over everyone we have information about (so we know when to stop).