11.4.4 Performing blocking pops from the sharded LIST
We’ve stepped through pushing items onto both ends of a long LIST, popping items
off both ends, and even written a function to get the total length of a sharded LIST. In
this section, we’ll build a method to perform blocking pops from both ends of the
sharded LIST. In previous chapters, we’ve used blocking pops to implement messaging
and task queues, though other uses are also possible.
Whenever possible, if we don’t need to actually block and wait on a request, we
should use the nonblocking versions of the sharded LIST pop methods. This is
because, with the current semantics and commands available to Lua scripting and
WATCH/MULTI/EXEC transactions, there are still some situations where we may receive
incorrect data. These situations are rare, and we’ll go through a few steps to try to prevent
them from happening, but every system has limitations.
In order to perform a blocking pop, we’ll cheat somewhat. First, we’ll try to perform
a nonblocking pop in a loop until we run out of time, or until we get an item. If
that works, then we’re done. If that doesn’t get an item, then we’ll loop over a few
steps until we get an item, or until our timeout is up.
The specific sequence of operations we’ll perform is to start by trying the nonblocking
pop. If that fails, then we fetch information about the first and last shard IDs.
If the IDs are the same, we then perform a blocking pop on that shard ID. Well, sort of.
Because the shard ID of the end we want to pop from could’ve changed since we
fetched the endpoints (due to round-trip latencies), we insert a pipelined Lua script
EVAL call just before the blocking pop. This script verifies whether we’re trying to pop
from the correct LIST. If we are, then it does nothing, and our blocking pop operation
occurs without issue. But if it’s the wrong LIST, then the script will push an extra
“dummy” item onto the LIST, which will then be popped with the blocking pop operation
There’s a potential race between when the Lua script is executed and when the
blocking pop operation is executed. If someone attempts to pop or push an item from that same shard between when the Lua script is executed and when the blocking pop
operation is executed, then we could get bad data (the other popping client getting
our dummy item), or we could end up blocking on the wrong shard.
WHY NOT USE A MULTI/EXEC TRANSACTION?We’ve talked a lot about MULTI/
EXEC transactions as a way of preventing race conditions through the other
chapters. So why don’t we use WATCH/MULTI/EXEC to prepare information,
and then use a BLPOP/BRPOP operation as the last command before EXEC?
This is because if a BLPOP/BRPOP operation occurs on an empty LIST as part of
a MULTI/EXEC transaction, it’d block forever because no other commands can
be run in that time. To prevent such an error, BLPOP/BRPOP operations within
a MULTI/EXEC block will execute as their nonblocking LPOP/RPOP versions
(except allowing the client to pass multiple lists to attempt to pop from).
To address the issue with blocking on the wrong shard, we’ll only ever block for one
second at a time (even if we’re supposed to block forever). And to address the issue
with our blocking pop operations getting data that wasn’t actually on the end shard,
we’ll operate under the assumption that if data came in between two non-transactional
pipelined calls, it’s close enough to being correct. Our functions for handling
blocking pops can be seen in the next listing.
There are a lot of pieces that come together to make this actually work, but remember
that there are three basic pieces. The first piece is a helper that handles the loop to
actually fetch the item. Inside this loop, we call the second piece, which is the helper/
blocking pop pair of functions, which handles the blocking portion of the calls. The
third piece is the API that users will actually call, which handles passing all of the
proper arguments to the helper.
For each of the commands operating on sharded LISTs, we could implement them
with WATCH/MULTI/EXEC transactions. But a practical issue comes up when there’s a
modest amount of contention, because each of these operations manipulates multiple
structures simultaneously, and will manipulate structures that are calculated as part of
the transaction itself. Using a lock over the entire structure can help somewhat, but
using Lua improves performance significantly.