EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

2.4 Database row caching

In this chapter so far, we’ve moved login and visitor sessions from our relational database and web browser to Redis, we’ve taken shopping carts out of the relational database and put them into Redis, and we’ve cached entire pages in Redis. This has helped us improve performance and reduce the load on our relational database, which has also lowered our costs.

Individual product pages that we’re displaying to a user typically only load one or two rows from the database: the user information for the user who’s logged in (with our generated pages, we can load that with an AJAX call to keep using our cache), and the information about the item itself. Even for pages where we may not want to cache the whole page (customer account pages, a given user’s past orders, and so on), we could instead cache the individual rows from our relational database.

As an example of where caching rows like this would be useful, let’s say that Fake Web Retailer has decided to start a new promotion to both clean out some older inventory and get people coming back to spend money. To make this happen, we’ll start performing daily deal sales for certain items until they run out. In the case of a deal, we can’t cache the full page, because then someone might see a version of the page with an incorrect count of items remaining. And though we could keep reading the item’s row from the database, that could push our database to become overutilized, which would then increase our costs because we’d need to scale our databases up again.

Figure 2.1 A cached database row for an item to be sold online

To cache database rows in preparation for a heavy load, we’ll write a daemon function that will run continuously, whose purpose will be to cache specific database rows in Redis, updating them on a variable schedule. The rows themselves will be JSON-encoded dictionaries stored as a plain Redis value. We’ll map column names and row values to the dictionary keys and values. An example row can be seen in figure 2.1.

In order to know when to update the cache, we’ll use two ZSETs. Our first ZSET, the scheduleZSET, will use the row ID from the original database row as the member of the ZSET. We’ll use a timestamp for our schedule scores, which will tell us when the row should be copied to Redis next. Our second ZSET, the delayZSET, will use the same row ID for the members, but the score will be how many seconds to wait between cache updates.

Using Json Instead of Other FormatsOur use of JSON instead of XML, Google’s protocol buffers, Thrift, BSON, MessagePack, or other serialization formats is a subjective one. We generally use JSON because it’s human readable, somewhat concise, and it has fast encoding and decoding libraries available in every language with an existing Redis client (as far as we know). If your situation requires the use of another format, or if you’d prefer to use a different format, then feel free to do so.

Nested StructuresOne feature that users of other non-relational databases sometime expect is the ability to nest structures. Specifically, some new users of Redis expect that a HASH should be able to have a value that’s a ZSET or LIST. Though conceptually this is fine, there’s a question that comes up early in such a discussion that boils down to a simple example, “How do I increment a value in a HASH that’s nested five levels deep?” As a matter of keeping the syntax of commands simple, Redis doesn’t allow nested structures. If necessary, you can use key names for this (user:123 could be a HASH and user:123:posts could be a ZSET of recent posts from that user). Or you can explicitly store your nested structures using JSON or some other serialization library of your choice (Lua scripting, covered in chapter 11, supports serverside manipulation of JSON and MessagePack encoded data).

In order for rows to be cached on a regular basis by the caching function, we’ll first add the row ID to our delay ZSET with the given delay. This is because our actual caching function will require the delay, and if it’s missing, will remove the scheduled item. When the row ID is in the delay ZSET, we’ll then add the row ID to our schedule ZSET with the current timestamp. If we want to stop a row from being synced to Redis and remove it from the cache, we can set the delay to be less than or equal to 0, and our caching function will handle it. Our function to schedule or stop caching can be seen in the following listing.

Listing 2.7 The schedule_row_cache() function
def schedule_row_cache(conn, row_id, delay):
	conn.zadd('delay:', row_id, delay)

Set the delay for the item first.

	conn.zadd('schedule:', row_id, time.time())

Schedule the item to be cached now.

Now that we have the scheduling part done, how do we cache the rows? We’ll pull the first item from the schedule ZSET with its score. If there are no items, or if the timestamp returned is in the future, we’ll wait 50 milliseconds and try again. When we have an item that should be updated now, we’ll check the row’s delay. If the delay for the next caching time is less than or equal to 0, we’ll remove the row ID from the delay and schedule ZSETs, as well as delete the cached row and try again. Finally, for any row that should be cached, we’ll update the row’s schedule, pull the row from the database, and save a JSON-encoded version of the row to Redis. Our function for doing this can be seen in this listing.

Listing 2.8 The cache_rows() daemon function
def cache_rows(conn):
	while not QUIT:
		next = conn.zrange('schedule:', 0, 0, withscores=True)

Find the next row that should be cached (if any), including the timestamp, as a list of tuples with zero or one items.

		now = time.time()
		if not next or next[0][1] > now:
			time.sleep(.05)

No rows can be cached now, so wait 50 milliseconds and try again.

			continue

		row_id = next[0][0]
		delay = conn.zscore('delay:', row_id)

Get the delay before the next schedule.

		if delay <= 0:
			conn.zrem('delay:', row_id)
			conn.zrem('schedule:', row_id)
			conn.delete('inv:' + row_id)

The item shouldn’t be cached anymore; remove it from the cache.

			continue

		row = Inventory.get(row_id)

Get the database row.

		conn.zadd('schedule:', row_id, now + delay)
		conn.set('inv:' + row_id, json.dumps(row.to_dict()))

Update the schedule and set the cache value.

With the combination of a scheduling function and a continuously running caching function, we’ve added a repeating scheduled autocaching mechanism. With these two functions, inventory rows can be updated as frequently as we think is reasonable. For a daily deal with inventory counts being reduced and affecting whether someone can buy the item, it probably makes sense to update the cached row every few seconds if there are many buyers. But if the data doesn’t change often, or when back-ordered items are acceptable, it may make sense to only update the cache every minute. Both are possible with this simple method.

Now that we’re caching individual rows in Redis, could it be possible to further reduce our memory load by caching only some of our pages?