This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

7.3.2 Indexing ads

The process of indexing an ad is not so different from the process of indexing any
other content. The primary difference is that we aren’t looking to return a list of ads
(or search results); we want to return a single ad. There are also some secondary differences
in that ads will typically have required targeting parameters such as location,
age, or gender.

As mentioned before, we’ll only be targeting based on location and content, so this
section will discuss how to index ads based on location and content. When you’ve seen
how to index and target based on location and content, targeting based on, for example,
age, gender, or recent behavior should be similar (at least on the indexing and
targeting side of things).

Before we can talk about indexing an ad, we must first determine how to measure
the value of an ad in a consistent manner.


Three major types of ads are shown on web pages: cost per view, cost per click, and cost per
action (or acquisition). Cost per view ads are also known as CPM or cost per mille, and
are paid a fixed rate per 1,000 views of the ad itself. Cost per click, or CPC, ads are paid
a fixed rate per click on the ad itself. Cost per action, or CPA, ads are paid a sometimes
varying rate based on actions performed on the ad-destination site.

Making values consistent

To greatly simplify our calculations as to the value of showing a given ad, we’ll convert
all of our types of ads to have values relative to 1,000 views, generating what’s known as
an estimated CPM, or eCPM. CPM ads are the easiest because their value per thousand
views is already provided, so eCPM = CPM. But for both CPC and CPA ads, we must calculate
the eCPMs.

Calculating the estimated CPM of a CPC ad

If we have a CPC ad, we start with its cost per click, say $.25. We then multiply that cost
by the click-through rate (CTR) on the ad. Click-through rate is the number of clicks that
an ad received divided by the number of views the ad received. We then multiply that
result by 1,000 to get our estimated CPM for that ad. If our ad gets .2% CTR, or .002, then
our calculation looks something like this: .25 x .002 x 1000 = $.50 eCPM.

Calculating the estimated CPM of a CPA ad

When we have a CPA ad, the calculation is somewhat similar to the CPC value calculation.
We start with the CTR of the ad, say .2%. We multiply that against the probability
that the user will perform an action on the advertiser’s destination page, maybe 10%
or .1. We then multiply that times the value of the action performed, and again multiply
that by 1,000 to get our estimated CPM. If our CPA is $3, our calculation would look
like this: .002 x .1 x 3 x 1000 = $.60 eCPM.

Two helper functions for calculating the eCPM of CPC and CPA ads are shown next.

Listing 7.9Helper functions for turning information about CPC and CPA ads into eCPM
def cpc_to_ecpm(views, clicks, cpc):
    return 1000. * cpc * clicks / views

def cpa_to_ecpm(views, actions, cpa):
    return 1000. * cpa * actions / views

Because click-through rate is clicks/ views, and action rate is actions/ clicks, when we multiply them together we get actions/views.

Notice that in our helper functions we used clicks, views, and actions directly instead
of the calculated CTR. This lets us keep these values directly in our accounting system,
only calculating the eCPM as necessary. Also notice that for our uses, CPC and CPA are
similar, the major difference being that for most ads, the number of actions is significantly
lower than the number of clicks, but the value per action is typically much
larger than the value per click.

Now that we’ve calculated the basic value of an ad, let’s index an ad in preparation
for targeting.


When targeting an ad, we’ll have a group of optional and required targeting parameters.
In order to properly target an ad, our indexing of the ad must reflect the targeting requirements. Since we have two targeting options—location and content—we’ll say
that location is required (either on the city, state, or country level), but any matching
terms between the ad and the content of the page will be optional and a bonus.3

We’ll use the same search functions we defined in sections 7.1 and 7.2, with slightly
different indexing options. We’ll also asSUM e that you’ve taken my advice from chapter
4 by splitting up your different types of services to different machines (or databases)
as necessary, so that your ad-targeting index doesn’t overlap with your other
content indexes.

As in section 7.1, we’ll create inverted indexes that use SETs and ZSETs to hold ad
IDs. Our SETs will hold the required location targeting, which provides no additional
bonus. When we talk about learning from user behavior, we’ll get into how we calculate
our per-matched-word bonus, but initially we won’t include any of our terms for
targeting bonuses, because we don’t know how much they may contribute to the overall
value of the ad. Our ad-indexing function is shown here.

Listing 7.10A method for indexing an ad that’s targeted on location and ad content
    'cpc': cpc_to_ecpm,
    'cpa': cpa_to_ecpm,
    'cpm': lambda *args:args[-1],
def index_ad(conn, id, locations, content, type, value):
    pipeline = conn.pipeline(True)

Set up the pipeline so that we only need a single round trip to perform the full index operation.

    for location in locations:
        pipeline.sadd('idx:req:'+location, id)

Add the ad ID to all of the relevant location SETs for targeting.

    words = tokenize(content)
    for word in tokenize(content):
        pipeline.zadd('idx:' + word, id, 0)

Index the words for the ad.

    rvalue = TO_ECPM[type](
        1000, AVERAGE_PER_1K.get(type, 1), value)

We’ll keep a dictionary that stores the average number of clicks or actions per 1000 views on our network, for estimating the performance of new ads.

    pipeline.hset('type:', id, type)

Record what type of ad this is.

    pipeline.zadd('idx:ad:value:', id, rvalue)

Add the ad’s eCPM to a ZSET of all ads.

    pipeline.zadd('ad:base_value:', id, value)

Add the ad’s base value to a ZSET of all ads.

    pipeline.sadd('terms:' + id, *list(words))

Keep a record of the words that could be targeted for the ad.


As shown in the listing and described in the annotations, we made three important
additions to the listing. The first is that an ad can actually have multiple targeted locations.
This is necessary to allow a single ad to be targeted for any one of multiple locations
at the same time (like multiple cities, states, or countries).

The second is that we’ll keep a dictionary that holds information about the average
number of clicks and actions across the entire system. This lets us come up with a reasonable
estimate on the eCPM for CPC and CPA ads before they’ve even been seen in
the system.4

Finally, we’ll also keep a SET of all of the terms that we can optionally target in the
ad. I include this information as a precursor to learning about user behavior a little

It’s now time to search for and discover ads that match an ad request.

3 If ad copy matches page content, then the ad looks like the page and will be more likely to be clicked on than
an ad that doesn’t look like the page content.