e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server.
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    4.5 Non-transactional pipelines

    When we first introduced MULTI/EXEC in chapter 3, we talked about them as having a
    “transaction” property—everything between the MULTI and EXEC commands will execute
    without other clients being able to do anything. One benefit to using transactions is the underlying library’s use of a pipeline, which improves performance. This section
    will show how to use a pipeline without a transaction to further improve performance.

    You’ll remember from chapter 2 that some commands take multiple arguments for
    adding/updating—commands like MGET, MSET, HMGET, HMSET, RPUSH/LPUSH, SADD, ZADD, and others. Those commands exist to streamline calls to perform the same operation
    repeatedly. As you saw in chapter 2, this can result in significant performance
    improvements. Though not as drastic as these commands, the use of non-transactional
    pipelines offers many of the same performance advantages, and allows us to run
    a variety of commands at the same time.

    In the case where we don’t need transactions, but where we still want to do a lot of
    work, we could still use MULTI/EXEC for their ability to send all of the commands at the
    same time to minimize round trips and latency. Unfortunately, MULTI and EXEC aren’t
    free, and can delay other important commands from executing. But we can gain all the
    benefits of pipelining without using MULTI/EXEC. When we used MULTI/EXEC in Python
    in chapter 3 and in section 4.4, you may have noticed that we did the following:

    pipe = conn.pipeline()


    By passing True to the pipeline() method (or omitting it), we’re telling our client to
    wrap the sequence of commands that we’ll call with a MULTI/EXEC pair. If instead of
    passing True we were to pass False, we’d get an object that prepared and collected
    commands to execute similar to the transactional pipeline, only it wouldn’t be
    wrapped with MULTI/EXEC. For situations where we want to send more than one command
    to Redis, the result of one command doesn’t affect the input to another, and we
    don’t need them all to execute transactionally, passing False to the pipeline()
    method can further improve overall Redis performance. Let’s look at an example.

    Way back in sections 2.1 and 2.5, we wrote and updated a function called
    update_token(), which kept a record of recent items viewed and recent pages viewed,
    and kept the user’s login cookie updated. The updated code from section 2.5 is shown
    in listing 4.7. Note how the function will make three or five calls to Redis for every call
    of the function. As written, that will result in three or five round trips between Redis
    and our client.

    Listing 4.7The update_token() function from section 2.5
    def update_token(conn, token, user, item=None):

       timestamp = time.time()

    Get the timestamp.

       conn.hset('login:', token, user)
    

    Keep a mapping from the token to the logged-in user.

       conn.zadd('recent:', token, timestamp)
    

    Record when the token was last seen.

       if item:
    

          conn.zadd('viewed:' + token, item, timestamp)
    

    Record that the user viewed the item.

          conn.zremrangebyrank('viewed:' + token, 0, -26)
    

    Remove old items, keeping the most recent 25.

          conn.zincrby('viewed:', item, -1)
    

    Update the number of times the given item was viewed.

    If our Redis and web servers are connected over LAN with only one or two steps, we could
    expect that the round trip between the web server and Redis would be around 1–2 milliseconds.
    With three to five round trips between Redis and the web server, we could
    expect that it would take 3–10 milliseconds for update_token() to execute. At that
    speed, we could only expect a single web server thread to be able to handle 100–333
    requests per second. This is great, but we could do better. Let’s quickly create a nontransactional
    pipeline and make all of our requests over that pipeline. You can see the
    updated function in the next listing.

    Listing 4.8The update_token_pipeline() function
    def update_token_pipeline(conn, token, user, item=None):
    

       timestamp = time.time()
    

       pipe = conn.pipeline(False)
    

    Set up the pipeline.

       pipe.hset('login:', token, user)
    

       pipe.zadd('recent:', token, timestamp)
    

       if item:
    

          pipe.zadd('viewed:' + token, item, timestamp)
    

          pipe.zremrangebyrank('viewed:' + token, 0, -26)
    

          pipe.zincrby('viewed:', item, -1)
    

       pipe.execute()

    Execute the commands in the pipeline.

    By replacing our standard Redis connection with a pipelined connection, we can
    reduce our number of round trips by a factor of 3–5, and reduce the expected time to
    execute update_token_pipeline() to 1–2 milliseconds. At that speed, a single web
    server thread could handle 500–1000 requests per second if it only had to deal with
    updating item view information. Theoretically, this is great, but what about in reality?

    Let’s test both of these functions by performing a simple benchmark. We’ll test the
    number of requests that can be processed per second against a copy of Redis that’s on
    the same machine, across a fast and low-latency network connection, and across a slow
    and higher latency connection. We’ll first start with the benchmark code that we’ll use
    to test the performance of these connections. In our benchmark, we’ll call either
    update_token() or update_token_pipeline() repeatedly until we reach a prespecified
    timeout, and then calculate the number of requests we can service at a given time. The
    following listing shows the code that we’ll use to run our two update_token commands.

    Listing 4.9The benchmark_update_token() function
    def benchmark_update_token(conn, duration):
    

       for function in (update_token, update_token_pipeline):
    

    Execute both the update_token() and the update_token_pipeline() functions.

          count = 0
          start = time.time()
          end = start + duration
    

    Set up our counters and our ending conditions.

          while time.time() < end:
    

             count += 1
    

             function(conn, 'token', 'user', 'item')
    

    Call one of the two functions.

          delta = time.time() - start
    

    Calculate the duration.

          print function.__name__, count, delta, count / delta
    

    Print information about the results.

    When we run the benchmark function across a variety of connections with the given
    available bandwidth (gigabits or megabits) and latencies, we get data as shown in
    table 4.4.

    Table 4.4Performance of pipelined and nonpipelined connections over different types of connections. For high-speed connections, we’ll tend to run at the limit of what a single processor can perform for encoding/decoding commands in Redis. For slower connections, we’ll run at the limit of bandwidth and/or latency.

    Description

    Bandwidth

    Latency

    update_table() calls per second

    update_table_ pipeline()calls per second

    Local machine, Unix domain socket

    >1 gigabit

    0.015ms

    3,761

    6,394

    Local machine, localhost

    >1 gigabit

    0.015ms

    3,257

    5,991

    Remote machine, shared switch

    1 gigabit

    0.271ms

    739

    2,841

    Remote machine, connected through VPN

    1.8 megabit

    48ms

    3.67

    18.2

    Looking at the table, note that for high-latency connections, we can multiply performance
    by a factor of five using pipelines over not using pipelines. Even with very lowlatency
    remote connections, we’re able to improve performance by almost four times.
    For local connections, we actually run into the single-core performance limit of
    Python sending and receiving short command sequences using the Redis protocol
    (we’ll talk about this more in section 4.6).

    You now know how to push Redis to perform better without transactions. Beyond
    using pipelines, are there any other standard ways of improving the performance of
    Redis?