e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server.
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.2.2 Non-numeric sorting with ZSETs

    Typical comparison operations between strings will examine two strings character by
    character until one character is different, one string runs out of characters, or until
    they’re found to be equal. In order to offer the same sort of functionality with string
    data, we need to turn strings into numbers. In this section, we’ll talk about methods of
    converting strings into numbers that can be used with Redis ZSETs in order to sort
    based on string prefixes. After reading this section, you should be able to sort your
    ZSET search results with strings.

    Our first step in translating strings into numbers is understanding the limitations
    of what we can do. Because Redis uses IEEE 754 double-precision floating-point values
    to store scores, we’re limited to at most 64 bits worth of storage. Due to some subtleties
    in the way doubles represent numbers, we can’t use all 64 bits. Technically, we
    could use more than the equivalent of 63 bits, but that doesn’t buy us significantly more than 63 bits, and for our case, we’ll only use 48 bits for the sake of simplicity.
    Using 48 bits limits us to prefixes of 6 bytes on our data, which is often sufficient.

    To convert our string into an integer, we’ll trim our string down to six characters (as
    necessary), converting each character into its ASCII value. We’ll then extend the values
    to six entries for strings shorter than six characters. Finally, we’ll combine all of the values
    into an integer. Our code for converting a string into a score can be seen next.

    Listing 7.8A function to turn a string into a numeric score
    def string_to_score(string, ignore_case=False):
    
        if ignore_case:
            string = string.lower()
    
    

    We can handle optional caseinsensitive indexes easily, so we will.

        pieces = map(ord, string[:6])
    

    Convert the first six characters of the string into their numeric values, null being 0, tab being 9, capital A being 65, and so on.

        while len(pieces) < 6:
            pieces.append(-1)
    
    

    For strings that aren’t at least six characters long, we’ll add placeholder values to represent that the strings are short.

        score = 0
    
        for piece in pieces:
            score = score * 257 + piece + 1
    
    

    For each value in the converted string values, we add it to the score, taking into consideration that a null is different from a placeholder.

        return score * 2 + (len(string) > 6)
    

    Because we have an extra bit, we can also signify whether the string is exactly six characters or more, allowing us to differentiate “robber” and “robbers”, though not “robbers” and “robbery”.

    Most of our string_to_score() function should be straightforward, except for
    maybe our use of -1 as a filler value for strings shorter than six characters, and our use
    of 257 as a multiplier before adding each character value to the score. For many applications,
    being able to differentiate between hello\0 and hello can be important, so
    we take steps to differentiate the two, primarily by adding 1 to all ASCII values (making
    null 1), and using 0 (-1 + 1) as a filler value for short strings. As a bonus, we use an
    extra bit to tell us whether a string is more than six characters long, which helps us
    with similar six-character prefixes.2

    By mapping strings to scores, we’re able to get a prefix comparison of a little more
    than the first six characters of our string. For non-numeric data, this is more or less
    what we can reasonably do without performing extensive numeric gymnastics, and
    without running into issues with how a non-Python library transfers large integers
    (that may or may not have been converted to a double).

    When using scores derived from strings, the scores aren’t always useful for combining
    with other scores and are typically only useful for defining a single sort order.

    Exercise: Autocompleting with strings as scores

    Back in section 6.1.2, we used ZSETs with scores set to 0 to allow us to perform
    prefix matching on user names. We had to add items to the ZSET and either use
    WATCH/MULTI/EXEC or the lock that we covered in section 6.2 to make sure that we
    fetched the correct range. But if instead we added names with scores being the result
    of string_to_score() on the name itself, we could bypass the use of WATCH/
    MULTI/EXEC and locks when someone is looking for a prefix of at most six characters
    by using ZRANGEBYSCORE, with the endpoints we had calculated before being converted
    into scores as just demonstrated. Try rewriting our find_prefix_range() and
    autocomplete_on_prefix() functions to use ZRANGEBYSCORE instead.

    Exercise: Autocompleting with longer strings

    In this section and for the previous exercise, we converted arbitrary binary strings to
    scores, which limited us to six-character prefixes. By reducing the number of valid
    characters in our input strings, we don’t need to use a full 8+ bits per input character.
    Try to come up with a method that would allow us to use more than a six-character
    prefix if we only needed to autocomplete on lowercase alphabetic characters.

    that this is because the score that we produced from the string doesn’t really mean
    anything, aside from defining a sort order.

    Now that we can sort on arbitrary data, and you’ve seen how to use weights to
    adjust and combine numeric data, you’re ready to read about how we can use Redis
    SETs and ZSETs to target ads.

    2 If we didn’t care about differentiating between hello\0 and hello, then we wouldn’t need the filler. If we
    didn’t need the filler, we could replace our multiplier of 257 with 256 and get rid of the +1 adjustment. But
    with the filler, we actually use .0337 additional bits to let us differentiate short strings from strings that have
    nulls. And when combined with the extra bit we used to distinguish whether we have strings longer than six
    characters, we actually use 49.0337 total bits.