This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

7.2.1 Sorting search results with ZSETs

As we saw in chapter 1 and talked about in chapter 3, SETs can actually be provided as
arguments to the ZSET commands ZINTERSTORE and ZUNIONSTORE. When we pass SETs
to these commands, Redis will consider the SET members to have scores of 1. For now,
we aren’t going to worry about the scores of SETs in our operations, but we will later.

In this section, we’ll talk about using SETs and ZSETs together for a two-part searchand-
sort operation. When you’ve finished reading this section, you’ll understand why
and how we’d want to combine scores together as part of a document search.

Let’s consider a situation in which we’ve already performed a search and have our
result SET. We can sort our results with the SORT command, but that means we can
only sort based on a single value at a time. Being able to easily sort by a single value is
one of the reasons why we started out sorting with our indexes in the first place.

But say that we want to add the ability to vote on our knowledge base articles to
indicate if they were useful. We could put the vote count in the article HASH and use
SORT as we did before. That’s reasonable. But what if we also wanted to sort based on a
combination of recency and votes? We could do as we did in chapter 1 and predefine
the score increase for each vote. But if we don’t have enough information about how
much scores should increase with each vote, then picking a score early on will force us
to have to recalculate later when we find the right number.

Instead, we’ll keep a ZSET of the times that articles were last updated, as well as a
ZSET for the number of votes that an article has received. Both will use the article IDs
of the knowledge base articles as members of the ZSETs, with update times or vote
count as scores, respectively. We’ll also pass similar arguments to an updated search_and_zsort() function defined in the next listing, in order to calculate the
resulting sort order for only update times, only vote counts, or almost any relative balance
between the two.

Listing 7.6An updated function to search and sort based on votes and updated times
def search_and_zsort(conn, query, id=None, ttl=300, update=1, vote=0,
                    start=0, num=20, desc=True):

Like before, we’ll optionally take a previous result ID for pagination if the result is still available.

    if id and not conn.expire(id, ttl):
        id = None

We’ll refresh the search result’s TTL if possible.

    if not id:
        id = parse_and_search(conn, query, ttl=ttl)

If our search result expired, or if this is the first time we’ve searched, perform the standard SET search.

        scored_search = {
            id: 0,

We use the “id” key for the intersection, but we don’t want it to count toward weights.

            'sort:update': update,
            'sort:votes': vote

Set up the scoring adjustments for balancing update time and votes. Remember: votes can be adjusted to 1, 10, 100, or higher, depending on the sorting result desired.

        id = zintersect(conn, scored_search, ttl)

Intersect using our helper function that we define in listing 7.7.

    pipeline = conn.pipeline(True)
    pipeline.zcard('idx:' + id)

Fetch the size of the result ZSET.

    if desc:
        pipeline.zrevrange('idx:' + id, start, start + num - 1)
        pipeline.zrange('idx:' + id, start, start + num - 1)

Handle fetching a “page” of results.

    results = pipeline.execute()

    return results[0], results[1], id

Return the results and the ID for pagination.

Our search_and_zsort() works much like the earlier search_and_sort(), differing
primarily in how we sort/order our results. Rather than calling SORT, we perform a
ZINTERSTORE operation, balancing the search result SET, the updated time ZSET, and
the vote ZSET.

As part of search_and_zsort(), we used a helper function for handling the creation
of a temporary ID, the ZINTERSTORE call, and setting the expiration time of the
result ZSET. The zintersect() and zunion() helper functions are shown next.

Listing 7.7Some helper functions for performing ZSET intersections and unions
def _zset_common(conn, method, scores, ttl=30, **kw):
    id = str(uuid.uuid4())

Create a new temporary identifier.

    execute = kw.pop('_execute', True)

Allow the passing of an argument to determine whether we should defer pipeline execution.

    pipeline = conn.pipeline(True) if execute else conn

Set up a transactional pipeline so that we have consistent results for each individual call.

    for key in scores.keys():
        scores['idx:' + key] = scores.pop(key)

Add the ‘idx:’ prefix to our inputs.

        getattr(pipeline, method)('idx:' + id, scores, **kw)

Set up the call for one of the operations.

        pipeline.expire('idx:' + id, ttl)

Instruct Redis to expire the ZSET in the future.

        if execute:

Actually execute the operation, unless explicitly instructed not to by the caller.

        return id

Return the ID for the caller to process the results.

def zintersect(conn, items, ttl=30, **kw):
    return _zset_common(conn, 'zinterstore', dict(items), ttl, **kw)

Helper function to perform ZSET intersections.

def zunion(conn, items, ttl=30, **kw):
    return _zset_common(conn, 'zunionstore', dict(items), ttl, **kw)

Helper function to perform ZSET unions.

These helper functions are similar to our SET-based helpers, the primary difference
being that we’re passing a dictionary through to specify scores, so we need to do more
work to properly prefix all of our input keys.

Exercise: Article voting

In this section, we used ZSETs to handle combining a time and a vote count for an
article. You remember that we did much the same thing back in chapter 1 without
search, though we did handle groups of articles. Can you update article_vote(),
post_articles(), get_articles(), and get_group_articles() to use this new
method so that we can update our score per vote whenever we want?

In this section, we talked about how to combine SETs and ZSETs to calculate a simple
composite score based on vote count and updated time. Though we used 2 ZSETs as
sources for scores, there’s no reason why we couldn’t have used 1 or 100. It’s all a question
of what we want to calculate.

If we try to fully replace SORT and HASHes with the more flexible ZSET, we run into one
problem almost immediately: scores in ZSETs must be floating-point numbers. But we
can handle this issue in many cases by converting our non-numeric data to numbers.