EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

4.5 Non-transactional pipelines

When we first introduced MULTI/EXEC in chapter 3, we talked about them as having a
“transaction” property—everything between the MULTI and EXEC commands will execute
without other clients being able to do anything. One benefit to using transactions is the underlying library’s use of a pipeline, which improves performance. This section
will show how to use a pipeline without a transaction to further improve performance.

You’ll remember from chapter 2 that some commands take multiple arguments for
adding/updating—commands like MGET, MSET, HMGET, HMSET, RPUSH/LPUSH, SADD, ZADD, and others. Those commands exist to streamline calls to perform the same operation
repeatedly. As you saw in chapter 2, this can result in significant performance
improvements. Though not as drastic as these commands, the use of non-transactional
pipelines offers many of the same performance advantages, and allows us to run
a variety of commands at the same time.

In the case where we don’t need transactions, but where we still want to do a lot of
work, we could still use MULTI/EXEC for their ability to send all of the commands at the
same time to minimize round trips and latency. Unfortunately, MULTI and EXEC aren’t
free, and can delay other important commands from executing. But we can gain all the
benefits of pipelining without using MULTI/EXEC. When we used MULTI/EXEC in Python
in chapter 3 and in section 4.4, you may have noticed that we did the following:

pipe = conn.pipeline()


By passing True to the pipeline() method (or omitting it), we’re telling our client to
wrap the sequence of commands that we’ll call with a MULTI/EXEC pair. If instead of
passing True we were to pass False, we’d get an object that prepared and collected
commands to execute similar to the transactional pipeline, only it wouldn’t be
wrapped with MULTI/EXEC. For situations where we want to send more than one command
to Redis, the result of one command doesn’t affect the input to another, and we
don’t need them all to execute transactionally, passing False to the pipeline()
method can further improve overall Redis performance. Let’s look at an example.

Way back in sections 2.1 and 2.5, we wrote and updated a function called
update_token(), which kept a record of recent items viewed and recent pages viewed,
and kept the user’s login cookie updated. The updated code from section 2.5 is shown
in listing 4.7. Note how the function will make three or five calls to Redis for every call
of the function. As written, that will result in three or five round trips between Redis
and our client.

Listing 4.7The update_token() function from section 2.5
def update_token(conn, token, user, item=None):

   timestamp = time.time()

Get the timestamp.

   conn.hset('login:', token, user)

Keep a mapping from the token to the logged-in user.

   conn.zadd('recent:', token, timestamp)

Record when the token was last seen.

   if item:

      conn.zadd('viewed:' + token, item, timestamp)

Record that the user viewed the item.

      conn.zremrangebyrank('viewed:' + token, 0, -26)

Remove old items, keeping the most recent 25.

      conn.zincrby('viewed:', item, -1)

Update the number of times the given item was viewed.

If our Redis and web servers are connected over LAN with only one or two steps, we could
expect that the round trip between the web server and Redis would be around 1–2 milliseconds.
With three to five round trips between Redis and the web server, we could
expect that it would take 3–10 milliseconds for update_token() to execute. At that
speed, we could only expect a single web server thread to be able to handle 100–333
requests per second. This is great, but we could do better. Let’s quickly create a nontransactional
pipeline and make all of our requests over that pipeline. You can see the
updated function in the next listing.

Listing 4.8The update_token_pipeline() function
def update_token_pipeline(conn, token, user, item=None):

   timestamp = time.time()

   pipe = conn.pipeline(False)

Set up the pipeline.

   pipe.hset('login:', token, user)

   pipe.zadd('recent:', token, timestamp)

   if item:

      pipe.zadd('viewed:' + token, item, timestamp)

      pipe.zremrangebyrank('viewed:' + token, 0, -26)

      pipe.zincrby('viewed:', item, -1)

   pipe.execute()

Execute the commands in the pipeline.

By replacing our standard Redis connection with a pipelined connection, we can
reduce our number of round trips by a factor of 3–5, and reduce the expected time to
execute update_token_pipeline() to 1–2 milliseconds. At that speed, a single web
server thread could handle 500–1000 requests per second if it only had to deal with
updating item view information. Theoretically, this is great, but what about in reality?

Let’s test both of these functions by performing a simple benchmark. We’ll test the
number of requests that can be processed per second against a copy of Redis that’s on
the same machine, across a fast and low-latency network connection, and across a slow
and higher latency connection. We’ll first start with the benchmark code that we’ll use
to test the performance of these connections. In our benchmark, we’ll call either
update_token() or update_token_pipeline() repeatedly until we reach a prespecified
timeout, and then calculate the number of requests we can service at a given time. The
following listing shows the code that we’ll use to run our two update_token commands.

Listing 4.9The benchmark_update_token() function
def benchmark_update_token(conn, duration):

   for function in (update_token, update_token_pipeline):

Execute both the update_token() and the update_token_pipeline() functions.

      count = 0
      start = time.time()
      end = start + duration

Set up our counters and our ending conditions.

      while time.time() < end:

         count += 1

         function(conn, 'token', 'user', 'item')

Call one of the two functions.

      delta = time.time() - start

Calculate the duration.

      print function.__name__, count, delta, count / delta

Print information about the results.

When we run the benchmark function across a variety of connections with the given
available bandwidth (gigabits or megabits) and latencies, we get data as shown in
table 4.4.

Table 4.4Performance of pipelined and nonpipelined connections over different types of connections. For high-speed connections, we’ll tend to run at the limit of what a single processor can perform for encoding/decoding commands in Redis. For slower connections, we’ll run at the limit of bandwidth and/or latency.

Description

Bandwidth

Latency

update_table() calls per second

update_table_ pipeline()calls per second

Local machine, Unix domain socket

>1 gigabit

0.015ms

3,761

6,394

Local machine, localhost

>1 gigabit

0.015ms

3,257

5,991

Remote machine, shared switch

1 gigabit

0.271ms

739

2,841

Remote machine, connected through VPN

1.8 megabit

48ms

3.67

18.2

Looking at the table, note that for high-latency connections, we can multiply performance
by a factor of five using pipelines over not using pipelines. Even with very lowlatency
remote connections, we’re able to improve performance by almost four times.
For local connections, we actually run into the single-core performance limit of
Python sending and receiving short command sequences using the Redis protocol
(we’ll talk about this more in section 4.6).

You now know how to push Redis to perform better without transactions. Beyond
using pipelines, are there any other standard ways of improving the performance of
Redis?