4.1.3 Rewriting/compacting append-only files
After reading about AOF persistence, you’re probably wondering why snapshots exist
at all. If by using append-only files we can minimize our data losses to one second (or
essentially none at all), and minimize the time it takes to have data persisted to disk
on a regular basis, it would seem that our choice should be clear. But the choice is
actually not so simple: because every write to Redis causes a log of the command to be
written to disk, the append-only log file will continuously grow. Over time, a growing
AOF could cause your disk to run out of space, but more commonly, upon restart,
Redis will be executing every command in the AOF in order. When handling large
AOFs, Redis can take a very long time to start up.
To solve the growing AOF problem, we can use BGREWRITEAOF, which will rewrite
the AOF to be as short as possible by removing redundant commands. BGREWRITEAOF
works similarly to the snapshotting BGSAVE: performing a fork and subsequently
rewriting the append-only log in the child. As such, all of the same limitations with
snapshotting performance regarding fork time, memory use, and so on still stand
when using append-only files. But even worse, because AOFs can grow to be many
times the size of a dump (if left uncontrolled), when the AOF is rewritten, the OS
needs to delete the AOF, which can cause the system to hang for multiple seconds
while it’s deleting an AOF of tens of gigabytes.
With snapshots, we could use the save configuration option to enable the automatic
writing of snapshots using BGSAVE. Using AOFs, there are two configuration options that
enable automatic BGREWRITEAOF execution: auto-aof-rewrite-percentage and
auto-aof-rewrite-min-size. Using the example values of auto-aof-rewritepercentage 100 and auto-aof-rewrite-min-size 64mb, when AOF is enabled, Redis
will initiate a BGREWRITEAOF when the AOF is at least 100% larger than it was when Redis
last finished rewriting the AOF, and when the AOF is at least 64 megabytes in size. As a
point of configuration, if our AOF is rewriting too often, we can increase the 100 that represents
100% to something larger, though it will cause Redis to take longer to start up
if it has been a while since a rewrite happened.
Regardless of whether we choose append-only files or snapshots, having the data
on disk is a great first step. But unless our data has been backed up somewhere else
(preferably to multiple locations), we’re still leaving ourselves open to data loss.
Whenever possible, I recommend backing up snapshots and newly rewritten appendonly
files to other servers.
By using either append-only files or snapshots, we can keep our data between system
reboots or crashes. As load increases, or requirements for data integrity become
more stringent, we may need to look to replication to help us.