Database replication provides a mechanism to ensure high availability. When replication is enabled, your dataset is replicated to a slave node, which is constantly synchronized with the master node. If the master node fails, an automatic failover occurs and the slave node is promoted to be the new master node. When the old master node recovers, it becomes the slave node of the new master node. This auto-failover mechanism guarantees data is served with minimal to no interruption.
When rack-zone awareness is used, there is additional and more advanced logic used for determining which nodes gets designated as the master or slave, as explained in Rack-zone awareness.
Note: Enabling replication has implications for the total database size, as explained in Database memory limit.
Redise Flash Replication Considerations
If you have Redise Flash configured for your cluster, it is recommended to enable the sequential replication feature using rladmin. This is due to the potential for relatively slow replication times that can occur with Redise Flash enabled databases. In some cases, if sequential replication is not disabled, there is a risk of an Out Of Memory (OOM) situation. While it will not cause data loss on the master shards, the replication to slave shards may not succeed as long as there is high write rate traffic on the master and multiple replications at the same time.
The rladmin command below sets the numbers of master shards eligible to be replicated from the same cluster node and how many slave shards on the same cluster node can run replication process.
The recommended sequential replication configuration is two, i.e.:
$ rladmin tune cluster max_redis_forks 1 max_slave_full_syncs 1