10.2 Scaling writes and memory capacity
Back in chapter 2, we built a system that could automatically cache rendered web
pages inside Redis. Fortunately for us, it helped reduce page load time and web page
processing overhead. Unfortunately, we’ve come to a point where we’ve scaled our
cache up to the largest single machine we can afford, and must now split our data
among a group of smaller machines.
SCALING WRITE VOLUMEThough we discuss sharding in the context of
increasing our total available memory, these methods also work to increase
write throughput if we’ve reached the limit of performance that a single
machine can sustain.
In this section, we’ll discuss a method to scale memory and write throughput with
sharding, using techniques similar to those we used in chapter 9.
To ensure that we really need to scale our write capacity, we should first make sure
we’re doing what we can to reduce memory and how much data we’re writing:
- Make sure that we’ve checked all of our methods to reduce read data volume
- Make sure that we’ve moved larger pieces of unrelated functionality to different
servers (if we’re using our connection decorators from chapter 5 already, this
should be easy).
- If possible, try to aggregate writes in local memory before writing to Redis, as we
discussed in chapter 6 (which applies to almost all analytics and statistics calculation
- If we’re running into limitations with WATCH/MULTI/EXEC, consider using locks
as we discussed in chapter 6 (or consider using Lua, as we’ll talk about in chapter
- If we’re using AOF persistence, remember that our disk needs to keep up with
the volume of data we’re writing (400,000 small commands may only be a few
megabytes per second, but 100,000 x 1 KB writes is 100 megabytes per second).
Now that we’ve done everything we can to reduce memory use, maximize performance,
and understand the limitations of what a single machine can do, it’s time to
actually shard our data to multiple machines. The methods that we use to shard our
data to multiple machines rely on the number of Redis servers used being more or
less fixed. If we can estimate that our write volume will, for example, increase 4 times
every 6 months, we can preshard our data into 256 shards. By presharding into 256
shards, we’d have a plan that should be sufficient for the next 2 years of expected
growth (how far out to plan ahead for is up to you).
PRESHARDING FOR GROWTHWhen presharding your system in order to prepare
for growth, you may be in a situation where you have too little data to
make it worth running as many machines as you could need later. To still be
able to separate your data, you can run multiple Redis servers on a single machine for each of your shards, or you can use multiple Redis databases
inside a single Redis server. From this starting point, you can move to multiple
machines through the use of replication and configuration management
(see section 10.2.1). If you’re running multiple Redis servers on a single
machine, remember to have them listen on different ports, and make sure
that all servers write to different snapshot files and/or append-only files.
The first thing that we need to do is to talk about how we’ll define our shard