EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.3.3 Targeting ads

    As described earlier, when we receive a request to target an ad, we’re generally looking
    to find the highest eCPM ad that matches the viewer’s location. In addition to
    matching an ad based on a location, we’ll gather and use statistics about how the content
    of a page matches the content of an ad, and what that can do to affect the ad’s
    CTR. Using these statistics, content in an ad that matches a web page can result in
    bonuses that contribute to the calculated eCPM of CPC and CPA ads, which can result
    in those types of ads being shown more.

    Before showing any ads, we won’t have any bonus scoring for any of our web page
    content. But as we show ads, we’ll learn about what terms in the ads help or hurt the
    ad’s expected performance, which will allow us to change the relative value of each of
    the optional targeting terms.

    To execute the targeting, we’ll union all of the relevant location SETs to produce an
    initial group of ads that should be shown to the viewer. Then we’ll parse the content of
    the page on which the ad will be shown, add any relevant bonuses, and finally calculate
    a total eCPM for all ads that a viewer could be shown. After we’ve calculated those
    eCPMs, we’ll fetch the ad ID for the highest eCPM ad, record some statistics about our
    targeting, and return the ad itself. Our code for targeting an ad looks like this.

    Listing 7.11Ad targeting by location and page content bonuses
    def target_ads(conn, locations, content):
        pipeline = conn.pipeline(True)
    
        matched_ads, base_ecpm = match_location(pipeline, locations)
    

    Find all ads that fit the location targeting parameter, and their eCPMs.

        words, targeted_ads = finish_scoring(
    

    Finish any bonus scoring based on matching the content.

            pipeline, matched_ads, base_ecpm, content)
    
    

    Get an ID that can be used for reporting and recording of this particular ad target.

        pipeline.incr('ads:served:')
    
        pipeline.zrevrange('idx:' + targeted_ads, 0, 0)
    

    Fetch the top-eCPM ad ID.

        target_id, targeted_ad = pipeline.execute()[-2:]
    
        if not targeted_ad:
            return None, None
    
    

    If there were no ads that matched the location targeting, return nothing.

        ad_id = targeted_ad[0]
    
        record_targeting_result(conn, target_id, ad_id, words)
    
    

    Record the results of our targeting efforts as part of our learning process.

        return target_id, ad_id
    

    Return the target ID and the ad ID to the caller.

    In this first version, we hide away the details of exactly how we’re matching based on
    location and adding bonuses based on terms so that we can understand the general
    flow. The only part I didn’t mention earlier is that we’ll also generate a target ID, which
    is an ID that represents this particular execution of the ad targeting. This ID will allow
    us to track clicks later, helping us learn about which parts of the ad targeting may have
    contributed to the overall total clicks.

    As mentioned earlier, in order to match based on location, we’ll perform a SET
    union over the locations (city, state, country) that the viewer is coming from. While
    we’re here, we’ll also calculate the base eCPM of these ads without any bonuses
    applied. The code for performing this operation is shown next.

    Listing 7.12A helper function for targeting ads based on location
    def match_location(pipe, locations):
    
        required = ['req:' + loc for loc in locations]
    

    Calculate the SET key names for all of the provided locations.

        matched_ads = union(pipe, required, ttl=300, _execute=False)
    

    Calculate the SET of matched ads that are valid for this location.

        return matched_ads, zintersect(pipe,
            {matched_ads: 0, 'ad:value:': 1}, _execute=False)
    

    Return the matched ads SET ID, as well as the ID of the ZSET that includes the base eCPM of all of the matched ads.

    This code listing does exactly what I said it would do: it finds ads that match the location
    of the viewer, and it calculates the eCPM of all of those ads without any bonuses
    based on page content applied. The only thing that may be surprising about this listing
    is our passing of the funny _execute keyword argument to the zintersect()
    function, which delays the actual execution of calculating the eCPM of the ad until
    later. The purpose of waiting until later is to help minimize the number of round trips
    between our client and Redis.

    CALCULATING TARGETING BONUSES

    The interesting part of targeting isn’t the location matching; it’s calculating the
    bonuses. We’re trying to discover the amount to add to an ad’s eCPM, based on words
    in the page content that matched words in the ad. We’ll assume that we’ve precalculated
    a bonus for each word in each ad (as part of the learning phase), stored in ZSETs for each
    word, with members being ad IDs and scores being what should be added to the eCPM.

    These word-based, per-ad eCPM bonuses have values such that the average eCPM of
    the ad, when shown on a page with that word, is the eCPM bonus from the word plus
    the known average CPM for the ad. Unfortunately, when more than one word in an ad matches the content of the page, adding all of the eCPM bonuses gives us a total eCPM
    bonus that probably isn’t close to reality.

    We have eCPM bonuses based on word matching for each word that are based on
    single word matching page content alone, or with any one or a number of other words
    in the ad. What we really want to find is the weighted average of the eCPMs, where the
    weight of each word is the number of times the word matched the page content.
    Unfortunately, we can’t perform the weighted average calculation with Redis ZSETs,
    because we can’t divide one ZSET by another.

    Numerically speaking, the weighted average lies between the geometric average
    and the arithmetic average, both of which would be reasonable estimates of the combined
    eCPM. But we can’t calculate either of those averages when the count of matching
    words varies. The best estimate of the ad’s true eCPM is to find the maximum and
    minimum bonuses, calculate the average of those two, and use that as the bonus for
    multiple matching words.

    BEING MATHEMATICALLY RIGOROUSMathematically speaking, our method of
    averaging the maximum and minimum word bonuses to determine an overall
    bonus isn’t rigorous. The true mathematical expectation of the eCPM with a
    collection of matched words is different than what we calculated. We chose to
    use this mathematically nonrigorous method because it gets us to a reasonable
    answer (the weighted average of the words is between the minimum and
    maximum), with relatively little work. If you choose to use this bonus method
    along with our later learning methods, remember that there are better ways
    to target ads and to learn from user behavior. I chose this method because it’s
    easy to write, easy to learn, and easy to improve.

    We can calculate the maximum and minimum bonuses by using ZUNIONSTOREwith the
    MAX and MIN aggregates. And we can calculate their average by using them as part of a
    ZUNIONSTOREoperation with an aggregate of SUM, and their weights each being .5. Our
    function for combining bonus word eCPMs with the base eCPM of the ad can be seen
    in the next listing.

    Listing 7.13Calculating the eCPM of ads including content match bonuses
    def finish_scoring(pipe, matched, base, content):
        bonus_ecpm = {}
    
        words = tokenize(content)
    

    Tokenize the content for matching against ads.

        for word in words:
    
            word_bonus = zintersect(
                pipe, {matched: 0, word: 1}, _execute=False)
            bonus_ecpm[word_bonus] = 1
    

    Find the ads that are location targeted that also have one of the words in the content.

        if bonus_ecpm:
    
            minimum = zunion(
                pipe, bonus_ecpm, aggregate='MIN', _execute=False)
            maximum = zunion(
                pipe, bonus_ecpm, aggregate='MAX', _execute=False)
    

    Find the minimum and maximum eCPM bonuses for each ad.

            return words, zunion(
                pipe, {base:1, minimum:.5, maximum:.5}, _execute=False)
    

    Compute the total of the base, plus half of the minimum eCPM bonus, plus half of the maximum eCPM bonus.

        return words, base
    

    If there were no words in the content to match against, return just the known eCPM.

    As before, we continue to pass the _execute parameter to delay execution of our various
    ZINTERSTORE and ZUNIONSTOREoperations until after the function returns to the
    calling target_ads(). One thing that may be confusing is our use of ZINTERSTORE
    between the location-targeted ads and the bonuses, followed by a final ZUNIONSTORE
    call. Though we could perform fewer calls by performing a ZUNIONSTOREover all of
    our bonus ZSETs, followed by a single ZINTERSTORE call at the end (to match location),
    the majority of ad-targeting calls will perform better by performing many
    smaller intersections followed by a union.

    The difference between the two methods is illustrated in figure 7.5, which shows
    that essentially all of the data in all of the relevant word bonus ZSETs is examined
    when we union and then intersect. Compare that with figure 7.6, which shows that
    when we intersect and then union, Redis will examine far less data to produce the
    same result.

    After we have an ad targeted, we’re given a target_id and an ad_id. These IDs
    would then be used to construct an ad response that includes both pieces of information,
    in addition to fetching the ad copy, formatting the result, and returning the
    result to the web page or client that requested it.

    Figure 7.5The data that’s examined during a union-then-intersect calculation of ad-targeting bonuses includes all ads in the relevant word bonus ZSETs, even ads that don’t meet the location matching requirements.
    Figure 7.6The data that’s examined during an intersect-then-union calculation of ad-targeting bonuses only includes those ads that previously matched, which significantly cuts down on the amount of data that Redis will process.

    Exercise: No matching content

    If you pay careful attention to the flow of target_ads() through finish_scoring()
    in listings 7.11 and 7.13, you’ll notice that we don’t make any effort to deal with the
    case where an ad has zero matched words in the content. In that situation, the
    eCPM produced will actually be the average eCPM over all calls that returned that ad
    itself. It’s not difficult to see that this may result in an ad being shown that
    shouldn’t. Can you alter finish_scoring() to take into consideration ads that
    don’t match any content?

    The only part of our target_ads() function from listing 7.11 that we haven’t defined
    is record_targeting_result(), which we’ll now examine as part of the learning
    phase of ad targeting.

    4 It may seem strange to be estimating an expectation (which is arguably an estimate), but everything about
    targeting ads is fundamentally predicated on statistics of one kind or another. This is one of those cases where
    the basics can get us pretty far.