Documentation - Redise Pack

A guide to Redise Pack installation, operation and administration

open all | close all

9.1.3 Performance issues for long ziplists and intsets

As our structures grow beyond the ziplist and intset limits, they’re automatically converted into their more typical underlying structure types. This is done primarily because manipulating the compact versions of these structures can become slow as they grow longer.

For a firsthand look at how this happens, let’s start by updating our setting for list-max-ziplist-entries to 110,000. This setting is a lot larger than we’d ever use in practice, but it does help to highlight the issue. After that setting is updated and Redis is restarted, we’ll benchmark Redis to discover performance issues that can happen when long ziplist-encoded LISTs are used.

To benchmark the behavior of long ziplist-encoded LISTs, we’ll write a function that creates a LIST with a specified number of elements. After the LIST is created, we’ll repeatedly call the RPOPLPUSH command to move an item from the right end of the LIST to the left end. This will give us a lower bound on how expensive commands can be on very long ziplist-encoded LISTs. This benchmarking function is shown in the next listing.

Listing 9.5Our code to benchmark varying sizes of ziplist-encoded LISTs
def long_ziplist_performance(conn, key, length, passes, psize):

We’ll parameterize everything so that we can measure performance in a variety of ways.

   conn.delete(key)

Start by deleting the named key to ensure that we only benchmark exactly what we intend to.

   conn.rpush(key, *range(length))

Initialize the LIST by pushing our desired count of numbers onto the right end.

   pipeline = conn.pipeline(False)

Prepare a pipeline so that we are less affected by network round-trip times.

   t = time.time()

Start the timer

   for p in xrange(passes):

We’ll perform a Each pipeline execution will include psize actual calls to RPOPLPUSH.

      for pi in xrange(psize):

Each pipeline execution will include psize actual calls to RPOPLPUSH.

         pipeline.rpoplpush(key, key)

Each call will result in popping the rightmost item from the LIST, pushing it to the left end of the same LIST.

      pipeline.execute()

Execute the psize calls to RPOPLPUSH.

   return (passes * psize) / (time.time() - t or .001)

Calculate the number of calls per second that are performed.

As mentioned before, this code creates a LIST of a given size and then executes a number of RPOPLPUSH commands in pipelines. By computing the number of calls to RPOPLPUSH divided by the amount of time it took, we can calculate a number of operations per second that can be executed on ziplist-encoded LISTs of a given size. Let’s run this with steadily increasing list sizes to see how long ziplists can reduce performance.

Listing 9.6As ziplist-encoded LISTs grow, we can see performance drop
>>> long_ziplist_performance(conn, 'list', 1, 1000, 100)
52093.558416505381
>>> long_ziplist_performance(conn, 'list', 100, 1000, 100)
51501.154762768667
>>> long_ziplist_performance(conn, 'list', 1000, 1000, 100)
49732.490843316067

With lists encoded as ziplists at 1000 entries or smaller, Redis is still able to perform around 50,000 operations per second or better.

>>> long_ziplist_performance(conn, 'list', 5000, 1000, 100)
43424.056529592635
>>> long_ziplist_performance(conn, 'list', 10000, 1000, 100)
36727.062573334966

But as lists encoded as ziplists grow to 5000 or more, performance starts to drop off as memory copy costs start reducing performance.

>>> long_ziplist_performance(conn, 'list', 50000, 1000, 100)
16695.140684975777

When we hit 50,000 entries in a ziplist, performance has dropped significantly.

>>> long_ziplist_performance(conn, 'list', 100000, 500, 100)
553.10821080054586

And when we hit 100,000 entries, ziplists are effectively unusable.

At first glance, you may be thinking that this isn’t so bad even when you let a ziplist grow to a few thousand elements. But this shows only a single example operation, where all we’re doing is taking items off of the right end and pushing them to the left end. The ziplist encoding can find the right or left end of a sequence quickly (though shifting all of the items over for the insert is what slows us down), and for this small example we can exploit our CPU caches. But when scanning through a list for a particular value, like our autocomplete example from section 6.1, or fetching/updating individual fields of a HASH, Redis will have to decode many individual entries, and CPU caches won’t be as effective. As a point of data, replacing our RPOPLPUSH command with a call to LINDEX that gets an element in the middle of the LIST shows performance at roughly half the number of operations per second as our RPOPLPUSH call when LISTs are at least 5,000 items long. Feel free to try it for yourself.

If you keep your max ziplist sizes in the 500–2,000 item range, and you keep the max item size under 128 bytes or so, you should get reasonable performance. I personally try to keep max ziplist sizes to 1,024 elements with item sizes at 64 bytes or smaller. For many uses of HASHes that we’ve used so far, these limits should let you keep memory use down, and performance high.

As you develop solutions to problems outside of our examples, remember that if you can keep your LISTSETHASH, and ZSET sizes small, you can help keep your memory use low, which can let you use Redis for a wider variety of problems.

KEEPING KEY NAMES SHORTOne thing that I haven’t mentioned before is the use of minimizing the length of keys, both in the key portion of all values, as well as keys in HASHes, members of SETs and ZSETs, and entries in LISTs. The longer all of your strings are, the more data needs to be stored. Generally speaking, whenever it’s possible to store a relatively abbreviated piece of information like user:joe as a key or member, that’s preferable to storing username:joe, or even joe if user or username is implied. Though in some cases it may not make a huge difference, if you’re storing millions or billions of entries, those extra few megabytes or gigabytes may be useful later.

Now that you’ve seen that short structures in Redis can be used to reduce memory use, in the next section we’ll talk about sharding large structures to let us gain the benefits of the ziplist and intset optimizations for more problems.