e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server.
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    9.1.3 Performance issues for long ziplists and intsets

    As our structures grow beyond the ziplist and intset limits, they’re automatically converted into their more typical underlying structure types. This is done primarily because manipulating the compact versions of these structures can become slow as they grow longer.

    For a firsthand look at how this happens, let’s start by updating our setting for
    list-max-ziplist-entries to 110,000. This setting is a lot larger than we’d ever use
    in practice, but it does help to highlight the issue. After that setting is updated and
    Redis is restarted, we’ll benchmark Redis to discover performance issues that can happen
    when long ziplist-encoded LISTs are used.

    To benchmark the behavior of long ziplist-encoded LISTs, we’ll write a function
    that creates a LIST with a specified number of elements. After the LIST is created,
    we’ll repeatedly call the RPOPLPUSH command to move an item from the right end of
    the LIST to the left end. This will give us a lower bound on how expensive commands
    can be on very long ziplist-encoded LISTs. This benchmarking function is shown in
    the next listing.

    Listing 9.5Our code to benchmark varying sizes of ziplist-encoded LISTs
    def long_ziplist_performance(conn, key, length, passes, psize):
    

    We’ll parameterize everything so that we can measure performance in a variety of ways.

       conn.delete(key)
    

    Start by deleting the named key to ensure that we only benchmark exactly what we intend to.

       conn.rpush(key, *range(length))
    

    Initialize the LIST by pushing our desired count of numbers onto the right end.

       pipeline = conn.pipeline(False)
    

    Prepare a pipeline so that we are less affected by network round-trip times.

       t = time.time()
    

    Start the timer

       for p in xrange(passes):
    

    We’ll perform a Each pipeline execution will include psize actual calls to RPOPLPUSH.

          for pi in xrange(psize):
    

    Each pipeline execution will include psize actual calls to RPOPLPUSH.

             pipeline.rpoplpush(key, key)
    

    Each call will result in popping the rightmost item from the LIST, pushing it to the left end of the same LIST.

          pipeline.execute()
    

    Execute the psize calls to RPOPLPUSH.

       return (passes * psize) / (time.time() - t or .001)
    

    Calculate the number of calls per second that are performed.

    As mentioned before, this code creates a LIST of a given size and then executes a number
    of RPOPLPUSH commands in pipelines. By computing the number of calls to RPOPLPUSH
    divided by the amount of time it took, we can calculate a number of operations per
    second that can be executed on ziplist-encoded LISTs of a given size. Let’s run this with
    steadily increasing list sizes to see how long ziplists can reduce performance.

    Listing 9.6As ziplist-encoded LISTs grow, we can see performance drop
    >>> long_ziplist_performance(conn, 'list', 1, 1000, 100)
    52093.558416505381
    >>> long_ziplist_performance(conn, 'list', 100, 1000, 100)
    51501.154762768667
    >>> long_ziplist_performance(conn, 'list', 1000, 1000, 100)
    49732.490843316067
    

    With lists encoded as ziplists at 1000 entries or smaller, Redis is still able to perform around 50,000 operations per second or better.

    >>> long_ziplist_performance(conn, 'list', 5000, 1000, 100)
    43424.056529592635
    >>> long_ziplist_performance(conn, 'list', 10000, 1000, 100)
    36727.062573334966
    

    But as lists encoded as ziplists grow to 5000 or more, performance starts to drop off as memory copy costs start reducing performance.

    >>> long_ziplist_performance(conn, 'list', 50000, 1000, 100)
    16695.140684975777
    

    When we hit 50,000 entries in a ziplist, performance has dropped significantly.

    >>> long_ziplist_performance(conn, 'list', 100000, 500, 100)
    553.10821080054586
    

    And when we hit 100,000 entries, ziplists are effectively unusable.

    At first glance, you may be thinking that this isn’t so bad even when you let a ziplist
    grow to a few thousand elements. But this shows only a single example operation,
    where all we’re doing is taking items off of the right end and pushing them to the left
    end. The ziplist encoding can find the right or left end of a sequence quickly (though
    shifting all of the items over for the insert is what slows us down), and for this small example we can exploit our CPU caches. But when scanning through a list for a particular
    value, like our autocomplete example from section 6.1, or fetching/updating
    individual fields of a HASH, Redis will have to decode many individual entries, and CPU
    caches won’t be as effective. As a point of data, replacing our RPOPLPUSH command
    with a call to LINDEX that gets an element in the middle of the LIST shows performance
    at roughly half the number of operations per second as our RPOPLPUSH call
    when LISTs are at least 5,000 items long. Feel free to try it for yourself.

    If you keep your max ziplist sizes in the 500–2,000 item range, and you keep the
    max item size under 128 bytes or so, you should get reasonable performance. I personally
    try to keep max ziplist sizes to 1,024 elements with item sizes at 64 bytes or
    smaller. For many uses of HASHes that we’ve used so far, these limits should let you
    keep memory use down, and performance high.

    As you develop solutions to problems outside of our examples, remember that if
    you can keep your LIST, SET, HASH, and ZSET sizes small, you can help keep your memory
    use low, which can let you use Redis for a wider variety of problems.

    KEEPING KEY NAMES SHORTOne thing that I haven’t mentioned before is the
    use of minimizing the length of keys, both in the key portion of all values, as
    well as keys in HASHes, members of SETs and ZSETs, and entries in LISTs. The
    longer all of your strings are, the more data needs to be stored. Generally
    speaking, whenever it’s possible to store a relatively abbreviated piece of
    information like user:joe as a key or member, that’s preferable to storing
    username:joe, or even joe if user or username is implied. Though in some
    cases it may not make a huge difference, if you’re storing millions or billions
    of entries, those extra few megabytes or gigabytes may be useful later.

    Now that you’ve seen that short structures in Redis can be used to reduce memory
    use, in the next section we’ll talk about sharding large structures to let us gain the
    benefits of the ziplist and intset optimizations for more problems.