This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

6.6 Distributing files with Redis

When building distributed software and systems, it’s common to need to copy, distribute,
or process data files on more than one machine. There are a few different common
ways of doing this with existing tools. If we have a single server that will always
have files to be distributed, it’s not uncommon to use NFS or Samba to mount a path
or drive. If we have files whose contents change little by little, it’s also common to use
a piece of software called Rsync to minimize the amount of data to be transferred
between systems. Occasionally, when many copies need to be distributed among
machines, a protocol called BitTorrent can be used to reduce the load on the server
by partially distributing files to multiple machines, which then share their pieces
among themselves.

Unfortunately, all of these methods have a significant setup cost and value that’s
somewhat relative. NFS and Samba can work well, but both can have significant issues
when network connections aren’t perfect (or even if they are perfect), due to the way
both of these technologies are typically integrated with operating systems. Rsync is
designed to handle intermittent connection issues, since each file or set of files can be
partially transferred and resumed, but it suffers from needing to download complete
files before processing can start, and requires interfacing our software with Rsync in
order to fetch the files (which may or may not be a problem). And though BitTorrent
is an amazing technology, it only really helps if we’re running into limits sending from
our server, or if our network is underutilized. It also relies on interfacing our software
with a BitTorrent client that may not be available on all platforms, and which may not
have a convenient method to fetch files.

Each of the three methods described also require setup and maintenance of users,
permissions, and/or servers. Because we already have Redis installed, running, and
available, we’ll use Redis to distribute files instead. By using Redis, we bypass issues
that some other software has: our client handles connection issues well, we can fetch
the data directly with our clients, and we can start processing data immediately (no
need to wait for an entire file).