e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server.
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    9.3.2 Storing packed data

    After we have our packed location codes, we only need to store them in STRINGs with
    SETRANGE. But before we do so, we have to think for a moment about how many users
    we’re going to be storing information about. For example, suppose that Twitter has 750
    million users today (based on the observation that recently created users have IDs
    greater than 750 million); we’d need over 1.5 gigabytes of space to store location information
    for all Twitter users. Though most operating systems are able to reasonably allocate
    large regions of memory, Redis limits us to 512 megabyte STRINGs, and due to
    Redis’s clearing out of data when setting a value beyond the end of an existing STRING,
    setting the first value at the end of a long STRING will take more time than would be
    expected for a simple SETBIT call. Instead, we can use a technique similar to what we
    used in section 9.2.1, and shard our data across a collection of STRINGs.

    Unlike when we were sharding HASHes and SETs, we don’t have to worry about
    being efficient by keeping our shards smaller than a few thousand elements, since we
    can access an element directly without decoding any others. Similarly, we can write to
    a given offset efficiently. Instead, our concern is more along the lines of being efficient
    at a larger scale—specifically what will balance potential memory fragmentation,
    as well as minimize the number of keys that are necessary. For this example, we’ll store
    location information for 220 users (just over 1 million entries) per STRING, which will
    use about 2 megabytes per STRING. In the next listing, we see the code for updating
    location information for a user.

    Listing 9.15A function for storing location data in sharded STRINGs
    USERS_PER_SHARD = 2**20

    Set the size of each shard.

    def set_location(conn, user_id, country, state):
       code = get_code(country, state)

    Get the location code to store for the user.

       shard_id, position = divmod(user_id, USERS_PER_SHARD)

    Find the shard ID and position of the user in the specific shard.

       offset = position * 2

    Calculate the offset of the user’s data.

       pipe = conn.pipeline(False)
       pipe.setrange('location:%s'%shard_id, offset, code)

    Set the value in the proper sharded location table.

       tkey = str(uuid.uuid4())
       pipe.zadd(tkey, 'max', user_id)
          [tkey, 'location:max'], aggregate='max')

    Update a ZSET that stores the maximum user ID seen so far.


    For the most part, there shouldn’t be anything surprising there. We calculate the location
    code to store for a user, calculate the shard and the individual shard offset for the
    user, and then store the location code in the proper location for the user. The only thing that’s strange and may not seem necessary is that we also update a ZSET that stores the highest-numbered user ID that has been seen. This is primarily important when calculating
    aggregates over everyone we have information about (so we know when to stop).