EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

9.1.3 Performance issues for long ziplists and intsets

As our structures grow beyond the ziplist and intset limits, they’re automatically converted into their more typical underlying structure types. This is done primarily because manipulating the compact versions of these structures can become slow as they grow longer.

For a firsthand look at how this happens, let’s start by updating our setting for
list-max-ziplist-entries to 110,000. This setting is a lot larger than we’d ever use
in practice, but it does help to highlight the issue. After that setting is updated and
Redis is restarted, we’ll benchmark Redis to discover performance issues that can happen
when long ziplist-encoded LISTs are used.

To benchmark the behavior of long ziplist-encoded LISTs, we’ll write a function
that creates a LIST with a specified number of elements. After the LIST is created,
we’ll repeatedly call the RPOPLPUSH command to move an item from the right end of
the LIST to the left end. This will give us a lower bound on how expensive commands
can be on very long ziplist-encoded LISTs. This benchmarking function is shown in
the next listing.

Listing 9.5Our code to benchmark varying sizes of ziplist-encoded LISTs
def long_ziplist_performance(conn, key, length, passes, psize):

We’ll parameterize everything so that we can measure performance in a variety of ways.

   conn.delete(key)

Start by deleting the named key to ensure that we only benchmark exactly what we intend to.

   conn.rpush(key, *range(length))

Initialize the LIST by pushing our desired count of numbers onto the right end.

   pipeline = conn.pipeline(False)

Prepare a pipeline so that we are less affected by network round-trip times.

   t = time.time()

Start the timer

   for p in xrange(passes):

We’ll perform a Each pipeline execution will include psize actual calls to RPOPLPUSH.

      for pi in xrange(psize):

Each pipeline execution will include psize actual calls to RPOPLPUSH.

         pipeline.rpoplpush(key, key)

Each call will result in popping the rightmost item from the LIST, pushing it to the left end of the same LIST.

      pipeline.execute()

Execute the psize calls to RPOPLPUSH.

   return (passes * psize) / (time.time() - t or .001)

Calculate the number of calls per second that are performed.

As mentioned before, this code creates a LIST of a given size and then executes a number
of RPOPLPUSH commands in pipelines. By computing the number of calls to RPOPLPUSH
divided by the amount of time it took, we can calculate a number of operations per
second that can be executed on ziplist-encoded LISTs of a given size. Let’s run this with
steadily increasing list sizes to see how long ziplists can reduce performance.

Listing 9.6As ziplist-encoded LISTs grow, we can see performance drop
>>> long_ziplist_performance(conn, 'list', 1, 1000, 100)
52093.558416505381
>>> long_ziplist_performance(conn, 'list', 100, 1000, 100)
51501.154762768667
>>> long_ziplist_performance(conn, 'list', 1000, 1000, 100)
49732.490843316067

With lists encoded as ziplists at 1000 entries or smaller, Redis is still able to perform around 50,000 operations per second or better.

>>> long_ziplist_performance(conn, 'list', 5000, 1000, 100)
43424.056529592635
>>> long_ziplist_performance(conn, 'list', 10000, 1000, 100)
36727.062573334966

But as lists encoded as ziplists grow to 5000 or more, performance starts to drop off as memory copy costs start reducing performance.

>>> long_ziplist_performance(conn, 'list', 50000, 1000, 100)
16695.140684975777

When we hit 50,000 entries in a ziplist, performance has dropped significantly.

>>> long_ziplist_performance(conn, 'list', 100000, 500, 100)
553.10821080054586

And when we hit 100,000 entries, ziplists are effectively unusable.

At first glance, you may be thinking that this isn’t so bad even when you let a ziplist
grow to a few thousand elements. But this shows only a single example operation,
where all we’re doing is taking items off of the right end and pushing them to the left
end. The ziplist encoding can find the right or left end of a sequence quickly (though
shifting all of the items over for the insert is what slows us down), and for this small example we can exploit our CPU caches. But when scanning through a list for a particular
value, like our autocomplete example from section 6.1, or fetching/updating
individual fields of a HASH, Redis will have to decode many individual entries, and CPU
caches won’t be as effective. As a point of data, replacing our RPOPLPUSH command
with a call to LINDEX that gets an element in the middle of the LIST shows performance
at roughly half the number of operations per second as our RPOPLPUSH call
when LISTs are at least 5,000 items long. Feel free to try it for yourself.

If you keep your max ziplist sizes in the 500–2,000 item range, and you keep the
max item size under 128 bytes or so, you should get reasonable performance. I personally
try to keep max ziplist sizes to 1,024 elements with item sizes at 64 bytes or
smaller. For many uses of HASHes that we’ve used so far, these limits should let you
keep memory use down, and performance high.

As you develop solutions to problems outside of our examples, remember that if
you can keep your LIST, SET, HASH, and ZSET sizes small, you can help keep your memory
use low, which can let you use Redis for a wider variety of problems.

KEEPING KEY NAMES SHORTOne thing that I haven’t mentioned before is the
use of minimizing the length of keys, both in the key portion of all values, as
well as keys in HASHes, members of SETs and ZSETs, and entries in LISTs. The
longer all of your strings are, the more data needs to be stored. Generally
speaking, whenever it’s possible to store a relatively abbreviated piece of
information like user:joe as a key or member, that’s preferable to storing
username:joe, or even joe if user or username is implied. Though in some
cases it may not make a huge difference, if you’re storing millions or billions
of entries, those extra few megabytes or gigabytes may be useful later.

Now that you’ve seen that short structures in Redis can be used to reduce memory
use, in the next section we’ll talk about sharding large structures to let us gain the
benefits of the ziplist and intset optimizations for more problems.