This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

7.3.3 Targeting ads

As described earlier, when we receive a request to target an ad, we’re generally looking
to find the highest eCPM ad that matches the viewer’s location. In addition to
matching an ad based on a location, we’ll gather and use statistics about how the content
of a page matches the content of an ad, and what that can do to affect the ad’s
CTR. Using these statistics, content in an ad that matches a web page can result in
bonuses that contribute to the calculated eCPM of CPC and CPA ads, which can result
in those types of ads being shown more.

Before showing any ads, we won’t have any bonus scoring for any of our web page
content. But as we show ads, we’ll learn about what terms in the ads help or hurt the
ad’s expected performance, which will allow us to change the relative value of each of
the optional targeting terms.

To execute the targeting, we’ll union all of the relevant location SETs to produce an
initial group of ads that should be shown to the viewer. Then we’ll parse the content of
the page on which the ad will be shown, add any relevant bonuses, and finally calculate
a total eCPM for all ads that a viewer could be shown. After we’ve calculated those
eCPMs, we’ll fetch the ad ID for the highest eCPM ad, record some statistics about our
targeting, and return the ad itself. Our code for targeting an ad looks like this.

Listing 7.11Ad targeting by location and page content bonuses
def target_ads(conn, locations, content):
    pipeline = conn.pipeline(True)
    matched_ads, base_ecpm = match_location(pipeline, locations)

Find all ads that fit the location targeting parameter, and their eCPMs.

    words, targeted_ads = finish_scoring(

Finish any bonus scoring based on matching the content.

        pipeline, matched_ads, base_ecpm, content)

Get an ID that can be used for reporting and recording of this particular ad target.

    pipeline.zrevrange('idx:' + targeted_ads, 0, 0)

Fetch the top-eCPM ad ID.

    target_id, targeted_ad = pipeline.execute()[-2:]
    if not targeted_ad:
        return None, None

If there were no ads that matched the location targeting, return nothing.

    ad_id = targeted_ad[0]
    record_targeting_result(conn, target_id, ad_id, words)

Record the results of our targeting efforts as part of our learning process.

    return target_id, ad_id

Return the target ID and the ad ID to the caller.

In this first version, we hide away the details of exactly how we’re matching based on
location and adding bonuses based on terms so that we can understand the general
flow. The only part I didn’t mention earlier is that we’ll also generate a target ID, which
is an ID that represents this particular execution of the ad targeting. This ID will allow
us to track clicks later, helping us learn about which parts of the ad targeting may have
contributed to the overall total clicks.

As mentioned earlier, in order to match based on location, we’ll perform a SET
union over the locations (city, state, country) that the viewer is coming from. While
we’re here, we’ll also calculate the base eCPM of these ads without any bonuses
applied. The code for performing this operation is shown next.

Listing 7.12A helper function for targeting ads based on location
def match_location(pipe, locations):
    required = ['req:' + loc for loc in locations]

Calculate the SET key names for all of the provided locations.

    matched_ads = union(pipe, required, ttl=300, _execute=False)

Calculate the SET of matched ads that are valid for this location.

    return matched_ads, zintersect(pipe,
        {matched_ads: 0, 'ad:value:': 1}, _execute=False)

Return the matched ads SET ID, as well as the ID of the ZSET that includes the base eCPM of all of the matched ads.

This code listing does exactly what I said it would do: it finds ads that match the location
of the viewer, and it calculates the eCPM of all of those ads without any bonuses
based on page content applied. The only thing that may be surprising about this listing
is our passing of the funny _execute keyword argument to the zintersect()
function, which delays the actual execution of calculating the eCPM of the ad until
later. The purpose of waiting until later is to help minimize the number of round trips
between our client and Redis.


The interesting part of targeting isn’t the location matching; it’s calculating the
bonuses. We’re trying to discover the amount to add to an ad’s eCPM, based on words
in the page content that matched words in the ad. We’ll assume that we’ve precalculated
a bonus for each word in each ad (as part of the learning phase), stored in ZSETs for each
word, with members being ad IDs and scores being what should be added to the eCPM.

These word-based, per-ad eCPM bonuses have values such that the average eCPM of
the ad, when shown on a page with that word, is the eCPM bonus from the word plus
the known average CPM for the ad. Unfortunately, when more than one word in an ad matches the content of the page, adding all of the eCPM bonuses gives us a total eCPM
bonus that probably isn’t close to reality.

We have eCPM bonuses based on word matching for each word that are based on
single word matching page content alone, or with any one or a number of other words
in the ad. What we really want to find is the weighted average of the eCPMs, where the
weight of each word is the number of times the word matched the page content.
Unfortunately, we can’t perform the weighted average calculation with Redis ZSETs,
because we can’t divide one ZSET by another.

Numerically speaking, the weighted average lies between the geometric average
and the arithmetic average, both of which would be reasonable estimates of the combined
eCPM. But we can’t calculate either of those averages when the count of matching
words varies. The best estimate of the ad’s true eCPM is to find the maximum and
minimum bonuses, calculate the average of those two, and use that as the bonus for
multiple matching words.

BEING MATHEMATICALLY RIGOROUSMathematically speaking, our method of
averaging the maximum and minimum word bonuses to determine an overall
bonus isn’t rigorous. The true mathematical expectation of the eCPM with a
collection of matched words is different than what we calculated. We chose to
use this mathematically nonrigorous method because it gets us to a reasonable
answer (the weighted average of the words is between the minimum and
maximum), with relatively little work. If you choose to use this bonus method
along with our later learning methods, remember that there are better ways
to target ads and to learn from user behavior. I chose this method because it’s
easy to write, easy to learn, and easy to improve.

We can calculate the maximum and minimum bonuses by using ZUNIONSTOREwith the
MAX and MIN aggregates. And we can calculate their average by using them as part of a
ZUNIONSTOREoperation with an aggregate of SUM, and their weights each being .5. Our
function for combining bonus word eCPMs with the base eCPM of the ad can be seen
in the next listing.

Listing 7.13Calculating the eCPM of ads including content match bonuses
def finish_scoring(pipe, matched, base, content):
    bonus_ecpm = {}
    words = tokenize(content)

Tokenize the content for matching against ads.

    for word in words:
        word_bonus = zintersect(
            pipe, {matched: 0, word: 1}, _execute=False)
        bonus_ecpm[word_bonus] = 1

Find the ads that are location targeted that also have one of the words in the content.

    if bonus_ecpm:
        minimum = zunion(
            pipe, bonus_ecpm, aggregate='MIN', _execute=False)
        maximum = zunion(
            pipe, bonus_ecpm, aggregate='MAX', _execute=False)

Find the minimum and maximum eCPM bonuses for each ad.

        return words, zunion(
            pipe, {base:1, minimum:.5, maximum:.5}, _execute=False)

Compute the total of the base, plus half of the minimum eCPM bonus, plus half of the maximum eCPM bonus.

    return words, base

If there were no words in the content to match against, return just the known eCPM.

As before, we continue to pass the _execute parameter to delay execution of our various
ZINTERSTORE and ZUNIONSTOREoperations until after the function returns to the
calling target_ads(). One thing that may be confusing is our use of ZINTERSTORE
between the location-targeted ads and the bonuses, followed by a final ZUNIONSTORE
call. Though we could perform fewer calls by performing a ZUNIONSTOREover all of
our bonus ZSETs, followed by a single ZINTERSTORE call at the end (to match location),
the majority of ad-targeting calls will perform better by performing many
smaller intersections followed by a union.

The difference between the two methods is illustrated in figure 7.5, which shows
that essentially all of the data in all of the relevant word bonus ZSETs is examined
when we union and then intersect. Compare that with figure 7.6, which shows that
when we intersect and then union, Redis will examine far less data to produce the
same result.

After we have an ad targeted, we’re given a target_id and an ad_id. These IDs
would then be used to construct an ad response that includes both pieces of information,
in addition to fetching the ad copy, formatting the result, and returning the
result to the web page or client that requested it.

Figure 7.5The data that’s examined during a union-then-intersect calculation of ad-targeting bonuses includes all ads in the relevant word bonus ZSETs, even ads that don’t meet the location matching requirements.

Figure 7.6The data that’s examined during an intersect-then-union calculation of ad-targeting bonuses only includes those ads that previously matched, which significantly cuts down on the amount of data that Redis will process.

Exercise: No matching content

If you pay careful attention to the flow of target_ads() through finish_scoring()
in listings 7.11 and 7.13, you’ll notice that we don’t make any effort to deal with the
case where an ad has zero matched words in the content. In that situation, the
eCPM produced will actually be the average eCPM over all calls that returned that ad
itself. It’s not difficult to see that this may result in an ad being shown that
shouldn’t. Can you alter finish_scoring() to take into consideration ads that
don’t match any content?

The only part of our target_ads() function from listing 7.11 that we haven’t defined
is record_targeting_result(), which we’ll now examine as part of the learning
phase of ad targeting.

4 It may seem strange to be estimating an expectation (which is arguably an estimate), but everything about
targeting ads is fundamentally predicated on statistics of one kind or another. This is one of those cases where
the basics can get us pretty far.