EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    5.2.2 Storing statistics in Redis

    Truth be told, I’ve personally implemented five different methods of storing statistics in Redis. The method described here takes many of the good ideas from those methods and combines them in a way that allows for the greatest flexibility and opportunity to scale. What are we going to build?

    We’ll build a method to store statistics that have a similar scope to our
    log_common() function from section 5.1.2 (the current hour and the last hour). We’ll
    collect enough information to keep track of the minimum, maximum, average value,
    standard deviation, sample count, and the sum of values that we’re recording. We
    record so much information because we can just about guarantee that if we aren’t
    recording it, we’ll probably need it.

    For a given named context and type, we’ll store a group of values in a ZSET. We
    won’t use the ZSET for its ability to sort scores, but instead for its ability to be unioned
    against another ZSET, keeping only the MIN or MAX of items that intersect. The precise
    information that we’ll store for that context and type is the minimum value, the maximum
    value, the count of values, the sum of the values, and the sum of the squares of
    the values. With that information, we can calculate the average and standard deviation.
    Figure 5.3 shows an example of a ZSET holding this information for the ProfilePage
    context with statistics on AccessTime.

    Now that we know the type of data that we’ll be storing, how do we get the data in
    there? We’ll start like we did with our common logs by checking to make sure that our
    current data is for the correct hour, moving the old data to an archive if it’s not for the
    current hour. We’ll then construct two temporary ZSETs—one with the minimum

    Figure 5.3Example access time stats for the profile page. Remember that ZSETs are sorted by score, which is why our order seems strange compared to our description.

    value, the other with the maximum value—and ZUNIONSTORE them with the current
    stats with an aggregate of MIN and MAX, respectively. That’ll allow us to quickly update
    the data without needing to WATCH a potentially heavily updated stats key. After cleaning
    up those temporary ZSETs, we’ll then ZINCRBY the count, sum, and sumsq members
    of the statsZSET. Our code for performing this operation is shown next.

    Listing 5.6The update_stats() function
    def update_stats(conn, context, type, value, timeout=5):
    

       destination = 'stats:%s:%s'%(context, type)
    

    Set up the destination statistics key.

       start_key = destination + ':start'
    

    Handle the current hour/last hour like in common_log().

       pipe = conn.pipeline(True)
       end = time.time() + timeout
       while time.time() < end:
          try:
    

             pipe.watch(start_key)
             now = datetime.utcnow().timetuple()
             hour_start = datetime(*now[:4]).isoformat()
    
    

    Handle the current hour/last hour like in common_log().

             existing = pipe.get(start_key)
             pipe.multi()
             if existing and existing < hour_start:
    

                pipe.rename(destination, destination + ':last')
                pipe.rename(start_key, destination + ':pstart')
                pipe.set(start_key, hour_start)
    
    

    Handle the current hour/last hour like in common_log().

             tkey1 = str(uuid.uuid4())
             tkey2 = str(uuid.uuid4())
    

             pipe.zadd(tkey1, 'min', value)
             pipe.zadd(tkey2, 'max', value)
    

    Add the value to the temporary keys.

             pipe.zunionstore(destination,
                [destination, tkey1], aggregate='min')
             pipe.zunionstore(destination,
                [destination, tkey2], aggregate='max')
    
    

    Union the temporary keys with the destination stats key, using the appropriate min/max aggregate.

             pipe.delete(tkey1, tkey2)
    

    Clean up the temporary keys.

             pipe.zincrby(destination, 'count')
             pipe.zincrby(destination, 'sum', value)
             pipe.zincrby(destination, 'sumsq', value*value)
    
    

    Update the count, sum, and sum of squares members of the ZSET.

             return pipe.execute()[-3:]
    

    *Return the base counter info so that the caller can do something interesting if necessary.

          except redis.exceptions.WatchError:
    

             continue
    

    If the hour just turned over and the stats have already been shuffled over, try again.

    We can ignore almost all of the first half of the code listing, since it’s a verbatim copy
    of the rollover code from our log_common() function from section 5.1.2. The latter
    half does exactly what we described: creating temporary ZSETs, ZUNIONSTOREing them
    with our destination ZSET with the proper aggregates, cleaning the temporary ZSETs,
    and then adding our standard statistics information. But what about pulling the statistics
    information back out?

    To pull the information back out, we need to pull all of the values from the ZSET
    and then calculate the average and standard deviation. The average is simply the sum
    member divided by the count member. But the standard deviation is more difficult.
    With a bit of work, we can derive the standard deviation from the information we
    have, though for the sake of brevity I won’t explain the math behind it. Our code for
    fetching stats is shown here.

    Listing 5.7The get_stats() function
    def get_stats(conn, context, type):
    

       key = 'stats:%s:%s'%(context, type)
    

    Set up the key that we’re fetching our statistics from.

       data = dict(conn.zrange(key, 0, -1, withscores=True))
    

    Fetch our basic statistics and package them as a dictionary.

       data['average'] = data['sum'] / data['count']
    

    Calculate the average.

       numerator = data['sumsq'] - data['sum'] ** 2 / data['count']
    

    Prepare the first part of the calculation of standard deviation.

       data['stddev'] = (numerator / (data['count'] - 1 or 1)) ** .5
    

    Finish our calculation of standard deviation.

       return data
    

    Aside from the calculation of the standard deviation, the get_stats() function isn’t
    surprising. And for those who’ve spent some time on the Wikipedia page for standard
    deviation, even calculating the standard deviation shouldn’t be all that surprising. But
    with all of this statistical information being stored, how do we know what information
    to look at? We’ll be answering that question and more in the next section.