4.2.2 Redis replication startup process
I briefly described what happens when a slave connects—that the master starts a snapshot
and sends that to the slave—but that’s the simple version. Table 4.2 lists all of the
operations that occur on both the master and slave when a slave connects to a master.
|Step||Master operations||Slave operations|
|1||(waiting for a command)||(Re-)connects to the master; issues the SYNC command|
|2||Starts BGSAVE operation; keeps a backlog of all write commands sent after BGSAVE||Serves old data (if any), or returns errors to commands (depending on configuration)|
|3||Finishes BGSAVE; starts sending the snapshot to the slave; continues holding a backlog of write commands||Discards all old data (if any); starts loading the dump as it’s received|
|4||Finishes sending the snapshot to the slave; starts sending the write command backlog to the slave||Finishes parsing the dump; starts responding to commands normally again|
|5||Finishes sending the backlog; starts live streaming of write commands as they happen||Finishes executing backlog of write commands from the master; continues executing commands as they happen|
With the method outlined in table 4.2, Redis manages to keep up with most loads during
replication, except in cases where network bandwidth between the master and
slave instances isn’t fast enough, or when the master doesn’t have enough memory to
fork and keep a backlog of write commands. Though it isn’t necessary, it’s generally
considered to be a good practice to have Redis masters only use about 50–65% of the
memory in our system, leaving approximately 30–45% for spare memory during
BGSAVE and command backlogs.
On the slave side of things, configuration is also simple. To configure the slave for
master/slave replication, we can either set the configuration option SLAVEOF host
port, or we can configure Redis during runtime with the SLAVEOF command. If we use
the configuration option, Redis will initially load whatever snapshot/AOF is currently
available (if any), and then connect to the master to start the replication process outlined
in table 4.2. If we run the SLAVEOF command, Redis will immediately try to connect
to the master, and upon success, will start the replication process outlined in
DURING SYNC, THE SLAVE FLUSHES ALL OF ITS DATAJust to make sure that we’re all on the same page (some users forget this the first time they try using slaves): when a slave initially connects to a master, any data that had been in memory
will be lost, to be replaced by the data coming from the master.
WARNING: REDIS DOESN’T SUPPORT MASTER-MASTER REPLICATIONWhen shown
master/slave replication, some people get the mistaken idea that because we
can set slaving options after startup using the SLAVEOF command, that means
we can get what’s known as multi-master replication by setting two Redis instances
as being SLAVEOF each other (some have even considered more than two in a
loop). Unfortunately, this does not work. At best, our two Redis instances will use
as much processor as they can, will be continually communicating back and
forth, and depending on which server we connect and try to read/write data
from/to, we may get inconsistent data or no data.
When multiple slaves attempt to connect to Redis, one of two different scenarios can
occur. Table 4.3 describes them.
|When additional slaves connect||Master operation|
|Before step 3 in table 4.2||All slaves will receive the same dump and same backlogged write commands.|
|On or after step 3 in table 4.2||While the master is finishing up the five steps for earlier slaves, a new sequence of steps 1-5 will start for the new slave(s).|
For the most part, Redis does its best to ensure that it doesn’t have to do more work
than is necessary. In some cases, slaves may try to connect at inopportune times and
cause the master to do more work. On the other hand, if multiple slaves connect at
the same time, the outgoing bandwidth used to synchronize all of the slaves initially
may cause other commands to have difficulty getting through, and could cause general
network slowdowns for other devices on the same network.