Data Persistence Concepts
Redis Enterprise is a fully durable database. It supports the following data persistence mechanisms:
- AOF (Append-Only File) data persistence – Every shard of a Redis database appends new lines to its persistent file in one of the following manners:
- every second (fast but less safe)
- every write (safer and slower)
- every second (fast but less safe)
- Snapshot – The entire point-in-time view of the dataset is written to persistent storage, across all shards of the database. The snapshot time is configurable.
Snapshots vs Backups
Snapshots and backup are designed for two different things. While snapshot supports data durability (i.e. to automatically recover data when there is no copy of the dataset in memory), backup supports disaster recovery (i.e. when the entire cluster needs to be built from scratch).
Ephemeral vs. Persistent Storage
In cloud native deployments such as a public cloud, private cloud or virtual private cloud, ephemeral (instance) storage cannot be used for durability purposes. Instead, a network-attached storage like AWS EBS, Azure Managed-Disks or GCP Persistent-Disk is required. That’s because, just as it sounds, ephemeral storage is ephemeral! When a cloud instance fails (relatively common occurrence in cloud environments), the contents of its local disk are also lost.
The Redis Enterprise cluster is designed to work with network-attached storage for data persistence. By default, every node in the cluster is connected to a network-attached storage resource, making the cluster immune to data loss events such as multiple node failures with no copies of the dataset left in RAM. This durability-proven architecture is illustrated in the following figure:
As illustrated above, in cases where there is no copy of the dataset left in RAM, Redis Enterprise will find the last copy of the dataset in the network-attached devices that were connected to the failed node, and use that to populate the Redis shard on the new cloud instance.
Data-Persistence at the Master or at the Slave Level?
By default, when data persistence is enabled, Redis Enterprise sets data persistence at the slave of each shard of the database. In this configuration there is no impact on performance, as the master shard is not affected by the slowness of the disk; on the other hand, replication adds some latencies that may break the data persistence SLA. Therefore, Redis Enterprise allows you to enable data persistence on both the master and slave shards. This is a more reliable configuration that doesn’t infringe on your data persistence SLA, but if the disk speed cannot cope with the throughput of ‘writes,’ it will affect the latency of your database, as Redis delays its processing when it cannot commit to disk. If you use Redis Enterprise DBaaS deployments (Cloud or VPC) you will automatically be tuned to work with a storage engine and the right shards configuration to support your persistent storage load; in an on-prem deployment it’s recommended to consult with Redis Labs Solutions Architects regarding your sizing. Data persistence options are shown in the figure below:
Enhanced Storage Engine
Redis Enterprise includes a few enhancements to the Redis storage engine to increase the throughput of the Redis core with data persistence enabled, and to better utilize cluster resources by allowing multiple Redis instances to run on the same cluster node without affecting performance:
- When AOF is used as a mechanism for data persistence, the size of the append-only file grows with every ‘write’ operation. An AOF rewrite process is then triggered to control the size of the file and reduce the recovery time from disk. By default (and configurable), the OSS Redis triggers a rewrite operation when the size of the AOF has doubled since the size of the previous rewrite operation. In a ‘write’ intensive scenario, the rewrite operation can block the main loop of Redis (as well as other Redis instances that are running on the same cluster node) from executing ongoing requests to disk. Redis Enterprise uses a greedy AOF rewrite algorithm that attempts to both postpone AOF rewrite operations as much as possible without infringing the SLA for recovery time (a configurable parameter) as well as prevent the rewrite from reaching the disk space limits. Due to optimal use of the rewrite process, the overall throughput of a persistent Redis instance is much higher than it otherwise would be.
- The Redis Enterprise storage layer allows multiple Redis instances to write to the same persistent storage in a non-blocking way, i.e. a busy shard that is constantly writing to disk (e.g. when AOF rewrite is performing) will not block other shards from executing durable operations.
A storage engine benchmark performed by Dell-EMC and Redis Labs showed that when using Redis Enterprise’s enhanced storage engine with Dell-EMC VMAX, Redis performance is nearly unaffected by AOF every-write operation as shown in the figure below:
More info on this benchmark can be found in the following resources:
- Your cloud can’t do that: 0.5M ops + ACID @<1msec latency!
- Benchmarking Redis Enterprise in full ACID configuration on EMC VMAX at over 0.5 Million ops/sec @ <1 ms latency
- Redis Enterprise and VMAX All Flash: Performance Assessment Tests and Best Practices