Documentation - Redise Pack

A guide to Redise Pack installation, operation and administration

open all | close all

4.5 Non-transactional pipelines

When we first introduced MULTI/EXEC in chapter 3, we talked about them as having a “transaction” property—everything between the MULTI and EXEC commands will execute without other clients being able to do anything. One benefit to using transactions is the underlying library’s use of a pipeline, which improves performance. This section will show how to use a pipeline without a transaction to further improve performance.

You’ll remember from chapter 2 that some commands take multiple arguments for adding/updating—commands like MGETMSETHMGETHMSETRPUSH/LPUSHSADDZADD, and others. Those commands exist to streamline calls to perform the same operation repeatedly. As you saw in chapter 2, this can result in significant performance improvements. Though not as drastic as these commands, the use of non-transactional pipelines offers many of the same performance advantages, and allows us to run a variety of commands at the same time.

In the case where we don’t need transactions, but where we still want to do a lot of work, we could still use MULTI/EXEC for their ability to send all of the commands at the same time to minimize round trips and latency. Unfortunately, MULTI and EXEC aren’t free, and can delay other important commands from executing. But we can gain all the benefits of pipelining without using MULTI/EXEC. When we used MULTI/EXEC in Python in chapter 3 and in section 4.4, you may have noticed that we did the following:

pipe = conn.pipeline()

By passing True to the pipeline() method (or omitting it), we’re telling our client to wrap the sequence of commands that we’ll call with a MULTI/EXEC pair. If instead of passing True we were to pass False, we’d get an object that prepared and collected commands to execute similar to the transactional pipeline, only it wouldn’t be wrapped with MULTI/EXEC. For situations where we want to send more than one command to Redis, the result of one command doesn’t affect the input to another, and we don’t need them all to execute transactionally, passing False to the pipeline() method can further improve overall Redis performance. Let’s look at an example.

Way back in sections 2.1 and 2.5, we wrote and updated a function called update_token(), which kept a record of recent items viewed and recent pages viewed, and kept the user’s login cookie updated. The updated code from section 2.5 is shown in listing 4.7. Note how the function will make three or five calls to Redis for every call of the function. As written, that will result in three or five round trips between Redis and our client.

Listing 4.7 The update_token() function from section 2.5
def update_token(conn, token, user, item=None):
   timestamp = time.time()

Get the timestamp.

   conn.hset('login:', token, user)

Keep a mapping from the token to the logged-in user.

   conn.zadd('recent:', token, timestamp)

Record when the token was last seen.

   if item:
      conn.zadd('viewed:' + token, item, timestamp)

Record that the user viewed the item.

      conn.zremrangebyrank('viewed:' + token, 0, -26)

Remove old items, keeping the most recent 25.

      conn.zincrby('viewed:', item, -1)

Update the number of times the given item was viewed.

If our Redis and web servers are connected over LAN with only one or two steps, we could expect that the round trip between the web server and Redis would be around 1–2 milliseconds. With three to five round trips between Redis and the web server, we could expect that it would take 3–10 milliseconds for update_token() to execute. At that speed, we could only expect a single web server thread to be able to handle 100–333 requests per second. This is great, but we could do better. Let’s quickly create a nontransactional pipeline and make all of our requests over that pipeline. You can see the updated function in the next listing.

Listing 4.8 The update_token_pipeline() function
def update_token_pipeline(conn, token, user, item=None):
   timestamp = time.time()
   pipe = conn.pipeline(False)

Set up the pipeline.

   pipe.hset('login:', token, user)
   pipe.zadd('recent:', token, timestamp)
   if item:
      pipe.zadd('viewed:' + token, item, timestamp)
      pipe.zremrangebyrank('viewed:' + token, 0, -26)
      pipe.zincrby('viewed:', item, -1)
   pipe.execute()

Execute the commands in the pipeline.

By replacing our standard Redis connection with a pipelined connection, we can reduce our number of round trips by a factor of 3–5, and reduce the expected time to execute update_token_pipeline() to 1–2 milliseconds. At that speed, a single web server thread could handle 500–1000 requests per second if it only had to deal with updating item view information. Theoretically, this is great, but what about in reality?

Let’s test both of these functions by performing a simple benchmark. We’ll test the number of requests that can be processed per second against a copy of Redis that’s on the same machine, across a fast and low-latency network connection, and across a slow and higher latency connection. We’ll first start with the benchmark code that we’ll use to test the performance of these connections. In our benchmark, we’ll call either update_token() or update_token_pipeline() repeatedly until we reach a prespecified timeout, and then calculate the number of requests we can service at a given time. The following listing shows the code that we’ll use to run our two update_token commands.

Listing 4.9 The benchmark_update_token() function
def benchmark_update_token(conn, duration):
   for function in (update_token, update_token_pipeline):

Execute both the update_token() and the update_token_pipeline() functions.

      count = 0
      start = time.time()
      end = start + duration

Set up our counters and our ending conditions.

      while time.time() < end:
         count += 1
         function(conn, 'token', 'user', 'item')

Call one of the two functions.

      delta = time.time() - start

Calculate the duration.

      print function.__name__, count, delta, count / delta

Print information about the results.

When we run the benchmark function across a variety of connections with the given available bandwidth (gigabits or megabits) and latencies, we get data as shown in table 4.4.

Table 4.4 Performance of pipelined and nonpipelined connections over different types of connections. For high-speed connections, we’ll tend to run at the limit of what a single processor can perform for encoding/decoding commands in Redis. For slower connections, we’ll run at the limit of bandwidth and/or latency.
Description Bandwidth Latency update_table() calls per second update_table_ pipeline()calls per second
Local machine, Unix domain socket >1 gigabit 0.015ms 3,761 6,394
Local machine, localhost >1 gigabit 0.015ms 3,257 5,991
Remote machine, shared switch 1 gigabit 0.271ms 739 2,841
Remote machine, connected through VPN 1.8 megabit 48ms 3.67 18.2

Looking at the table, note that for high-latency connections, we can multiply performance by a factor of five using pipelines over not using pipelines. Even with very low latency remote connections, we’re able to improve performance by almost four times. For local connections, we actually run into the single-core performance limit of Python sending and receiving short command sequences using the Redis protocol (we’ll talk about this more in section 4.6).

You now know how to push Redis to perform better without transactions. Beyond using pipelines, are there any other standard ways of improving the performance of Redis?