EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

6.3.1 Building a basic counting semaphore

When building a counting semaphore, we run into many of the same concerns we had
with other types of locking. We must decide who got the lock, how to handle processes
that crashed with the lock, and how to handle timeouts. If we don’t care about timeouts,

or handling the case where semaphore holders can crash without releasing semaphores,
we could build semaphores fairly conveniently in a few different ways. Unfortunately,
those methods don’t lead us to anything useful in the long term, so I’ll describe one
method that we’ll incrementally improve to offer a full range of functionality.

Figure 6.6Basic semaphore ZSET

In almost every case where we want to deal with timeouts in Redis, we’ll generally
look to one of two different methods. Either we’ll use EXPIRE like we did with our
standard locks, or we’ll use ZSETs. In this case, we want to use ZSETs, because that
allows us to keep information about multiple semaphore holders in a single structure.

More specifically, for each process that attempts to acquire the semaphore, we’ll
generate a unique identifier. This identifier will be the member of a ZSET. For the
score, we’ll use the timestamp for when the process attempted to acquire the semaphore.
Our semaphore ZSET will look something like figure 6.6.

When a process wants to attempt to acquire a semaphore, it first generates an identifier,
and then the process adds the identifier to the ZSET using the current timestamp
as the score. After adding the identifier, the process then checks for its identifier’s rank.
If the rank returned is lower than the total allowed count (Redis uses 0-indexing on
rank), then the caller has acquired the semaphore. Otherwise, the caller doesn’t have
the semaphore and must delete its identifier from the ZSET. To handle timeouts, before
adding our identifier to the ZSET, we first clear out any entries that have timestamps
that are older than our timeout number value. The code to acquire the semaphore can
be seen next.

Listing 6.12The acquire_semaphore() function
def acquire_semaphore(conn, semname, limit, timeout=10):
   identifier = str(uuid.uuid4())

A 128-bit random identifier.

   now = time.time()

   pipeline = conn.pipeline(True)
   pipeline.zremrangebyscore(semname, '-inf', now - timeout)

Time out old semaphore holders.

   pipeline.zadd(semname, identifier, now)

Try to acquire the semaphore.

   pipeline.zrank(semname, identifier)
   if pipeline.execute()[-1] < limit:

Check to see if we have it.

      return identifier

   conn.zrem(semname, identifier)

We failed to get the semaphore; discard our identifier.

   return None

Our code proceeds as I’ve already described: generating the identifier, cleaning out
any timed-out semaphores, adding its identifier to the ZSET, and checking its rank.
Not too surprising.

Releasing the semaphore is easy: we remove the identifier from the ZSET, as can be
seen in the next listing.

Listing 6.13The release_semaphore() function
def release_semaphore(conn, semname, identifier):
   return conn.zrem(semname, identifier) 

Returns True if the semaphore was properly released, False if it had timed out

This basic semaphore works well—it’s simple, and it’s very fast. But relying on every
process having access to the same system time in order to get the semaphore can
cause problems if we have multiple hosts. This isn’t a huge problem for our specific
use case, but if we had two systems A and B, where A ran even 10 milliseconds faster
than B, then if A got the last semaphore, and B tried to get a semaphore within 10 milliseconds,
B would actually “steal” A’s semaphore without A knowing it.

Any time we have a lock or a semaphore where such a slight difference in the system
clock can drastically affect who can get the lock, the lock or semaphore is considered
unfair. Unfair locks and semaphores can cause clients that should’ve gotten the lock or
semaphore to never get it, and this is something that we’ll fix in the next section.