4.3.2 Replacing a failed master
When we’re running a group of Redis servers with replication and persistence, there
may come a time when some part of our infrastructure stops working for one reason
or another. Maybe we get a bad hard drive, maybe bad memory, or maybe the power
just went out. Regardless of what causes the system to fail, we’ll eventually need to
replace a Redis server. Let’s look at an example scenario involving a master, a slave,
and needing to replace the master.
Machine A is running a copy of Redis that’s acting as the master, and machine B is
running a copy of Redis that’s acting as the slave. Unfortunately, machine A has just
lost network connectivity for some reason that we haven’t yet been able to diagnose.
But we have machine C with Redis installed that we’d like to use as the new master.
Our plan is simple: We’ll tell machine B to produce a fresh snapshot with SAVE.
We’ll then copy that snapshot over to machine C. After the snapshot has been copied
into the proper path, we’ll start Redis on machine C. Finally, we’ll tell machine B to
become a slave of machine C.3 Some example commands to make this possible on this
hypothetical set of systems are shown in the following listing.
Most of these commands should be familiar to those who have experience using and
maintaining Unix or Linux systems. The only interesting things in the commands
being run here are that we can initiate a SAVE on machine B by running a command,
and we later set up machine B to be a slave of machine C by running a command.
As an alternative to creating a new master, we may want to turn the slave into a master
and create a new slave. Either way, Redis will be able to pick up where it left off, and our only job from then on is to update our client configuration to read and write to the proper servers, and optionally update the on-disk server configuration if we
need to restart Redis.
REDIS SENTINELA relatively recent addition to the collection of tools available
with Redis is Redis Sentinel. By the final publishing of this manuscript,
Redis Sentinel should be complete. Generally, Redis Sentinel pays attention
to Redis masters and the slaves of the masters and automatically handles
failover if the master goes down. We’ll discuss Redis Sentinel in chapter 10.
In the next section, we’ll talk about keeping our data from being corrupted by multiple
writers working on the same data, which is a necessary step toward keeping our data safe.
3 Because B was originally a slave, our clients shouldn’t have been writing to B, so we won’t have any race conditions
with clients writing to B after the snapshot operation was started.