EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

5.2.3 Simplifying our statistics recording and discovery

Now we have our statistics stored in Redis—what next? More specifically, now that we
have information about (for example) access time on every page, how do we discover
which pages take a long time on average to generate? Or how do we know when it
takes significantly longer to generate a page than it did on previous occasions? The
simple answer is that we need to store more information in a way that lets us discover
when both situations happen, which we’ll explore in this section.

If we want to record access times, then we need to calculate access times. We can
spend our time adding access time calculations in various places and then adding
code to record the access times, or we can implement something to help us to calculate
and record the access times. That same helper could then also make that information
available in (for example) a ZSET of the slowest pages to access on average, and
could even report on pages that take a long time to access compared to other times
that page was accessed.

To help us calculate and record access times, we’ll write a Python context manager1
that will wrap our code that we want to calculate and record access times for.
This context manager will get the current time, let the wrapped code execute, and
then calculate the total time of execution, record it in Redis, and also update a ZSET of
the highest access time contexts. The next listing shows our context manager for performing
this set of operations.

Listing 5.8The access_time() context manager
@contextlib.contextmanager

Make this Python generator into a context manager.

def access_time(conn, context):

   start = time.time()

Record the start time.

   yield

Let the block of code that we’re wrapping run.

   delta = time.time() - start

Calculate the time that the block took to execute.

   stats = update_stats(conn, context, 'AccessTime', delta)

Update the stats for this context.

   average = stats[1] / stats[0]

Calculate the average.

   pipe = conn.pipeline(True)

   pipe.zadd('slowest:AccessTime', context, average)

Add the average to a ZSET that holds the slowest access times.

   pipe.zremrangebyrank('slowest:AccessTime', 0, -101)

Keep the slowest 100 items in the AccessTime ZSET.

   pipe.execute()

There’s some magic going on in the access_time() context manager, and it’ll probably
help to see it in use to understand what’s going on. The following code shows the
access_time() context manager being used to record access times of web pages that
are served through a similar kind of callback method as part of a middleware layer or
plugin that was used in our examples from chapter 2:

def process_view(conn, callback):

This web view takes the Redis connection as well as a callback to generate content.

   with access_time(conn, request.path):

This is how we’d use the access time context manager to wrap a block of code.

      return callback()

This is executed when the yield statement is hit from within the context manager.

After seeing the example, even if you don’t yet understand how to create a context
manager, you should at least know how to use one. In this example, we used the access
time context manager to calculate the total time to generate a web page. This context
manager could also be used to record the time it takes to make a database query or
the amount of time it takes to render a template. As an exercise, can you think of
other types of context managers that could record statistics that would be useful? Or
can you add reporting of access times that are more than two standard deviations
above average to the recent_log()?

GATHERING STATISTICS AND COUNTERS IN THE REAL WORLDI know that we just
spent several pages talking about how to gather fairly important statistics about
how our production systems operate, but let me remind you that there are preexisting
software packages designed for collecting and plotting counters and statistics. My personal favorite is Graphite (http://graphite.wikidot.com/), which you should probably download and install before spending too much time building your own data-plotting library.

Now that we’ve been recording diverse and important information about the state of
our application into Redis, knowing more about our visitors can help us to answer
other questions.

1 In Python, a context manager is a specially defined function or class that will have parts of it executed before
and after a given block of code is executed. This allows, for example, the easy opening and automatic closing
of files.