e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server.
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.3.2 Indexing ads

    The process of indexing an ad is not so different from the process of indexing any
    other content. The primary difference is that we aren’t looking to return a list of ads
    (or search results); we want to return a single ad. There are also some secondary differences
    in that ads will typically have required targeting parameters such as location,
    age, or gender.

    As mentioned before, we’ll only be targeting based on location and content, so this
    section will discuss how to index ads based on location and content. When you’ve seen
    how to index and target based on location and content, targeting based on, for example,
    age, gender, or recent behavior should be similar (at least on the indexing and
    targeting side of things).

    Before we can talk about indexing an ad, we must first determine how to measure
    the value of an ad in a consistent manner.

    CALCULATING THE VALUE OF AN AD

    Three major types of ads are shown on web pages: cost per view, cost per click, and cost per
    action (or acquisition). Cost per view ads are also known as CPM or cost per mille, and
    are paid a fixed rate per 1,000 views of the ad itself. Cost per click, or CPC, ads are paid
    a fixed rate per click on the ad itself. Cost per action, or CPA, ads are paid a sometimes
    varying rate based on actions performed on the ad-destination site.

    Making values consistent

    To greatly simplify our calculations as to the value of showing a given ad, we’ll convert
    all of our types of ads to have values relative to 1,000 views, generating what’s known as
    an estimated CPM, or eCPM. CPM ads are the easiest because their value per thousand
    views is already provided, so eCPM = CPM. But for both CPC and CPA ads, we must calculate
    the eCPMs.

    Calculating the estimated CPM of a CPC ad

    If we have a CPC ad, we start with its cost per click, say $.25. We then multiply that cost
    by the click-through rate (CTR) on the ad. Click-through rate is the number of clicks that
    an ad received divided by the number of views the ad received. We then multiply that
    result by 1,000 to get our estimated CPM for that ad. If our ad gets .2% CTR, or .002, then
    our calculation looks something like this: .25 x .002 x 1000 = $.50 eCPM.

    Calculating the estimated CPM of a CPA ad

    When we have a CPA ad, the calculation is somewhat similar to the CPC value calculation.
    We start with the CTR of the ad, say .2%. We multiply that against the probability
    that the user will perform an action on the advertiser’s destination page, maybe 10%
    or .1. We then multiply that times the value of the action performed, and again multiply
    that by 1,000 to get our estimated CPM. If our CPA is $3, our calculation would look
    like this: .002 x .1 x 3 x 1000 = $.60 eCPM.

    Two helper functions for calculating the eCPM of CPC and CPA ads are shown next.

    Listing 7.9Helper functions for turning information about CPC and CPA ads into eCPM
    def cpc_to_ecpm(views, clicks, cpc):
        return 1000. * cpc * clicks / views
    
    def cpa_to_ecpm(views, actions, cpa):
    
        return 1000. * cpa * actions / views
    

    Because click-through rate is clicks/ views, and action rate is actions/ clicks, when we multiply them together we get actions/views.

    Notice that in our helper functions we used clicks, views, and actions directly instead
    of the calculated CTR. This lets us keep these values directly in our accounting system,
    only calculating the eCPM as necessary. Also notice that for our uses, CPC and CPA are
    similar, the major difference being that for most ads, the number of actions is significantly
    lower than the number of clicks, but the value per action is typically much
    larger than the value per click.

    Now that we’ve calculated the basic value of an ad, let’s index an ad in preparation
    for targeting.

    INSERTING AN AD INTO THE INDEX

    When targeting an ad, we’ll have a group of optional and required targeting parameters.
    In order to properly target an ad, our indexing of the ad must reflect the targeting requirements. Since we have two targeting options—location and content—we’ll say
    that location is required (either on the city, state, or country level), but any matching
    terms between the ad and the content of the page will be optional and a bonus.3

    We’ll use the same search functions we defined in sections 7.1 and 7.2, with slightly
    different indexing options. We’ll also asSUM e that you’ve taken my advice from chapter
    4 by splitting up your different types of services to different machines (or databases)
    as necessary, so that your ad-targeting index doesn’t overlap with your other
    content indexes.

    As in section 7.1, we’ll create inverted indexes that use SETs and ZSETs to hold ad
    IDs. Our SETs will hold the required location targeting, which provides no additional
    bonus. When we talk about learning from user behavior, we’ll get into how we calculate
    our per-matched-word bonus, but initially we won’t include any of our terms for
    targeting bonuses, because we don’t know how much they may contribute to the overall
    value of the ad. Our ad-indexing function is shown here.

    Listing 7.10A method for indexing an ad that’s targeted on location and ad content
    TO_ECPM = {
        'cpc': cpc_to_ecpm,
        'cpa': cpa_to_ecpm,
        'cpm': lambda *args:args[-1],
    }
    def index_ad(conn, id, locations, content, type, value):
    
        pipeline = conn.pipeline(True)
    
    

    Set up the pipeline so that we only need a single round trip to perform the full index operation.

        for location in locations:
    
            pipeline.sadd('idx:req:'+location, id)
    
    

    Add the ad ID to all of the relevant location SETs for targeting.

        words = tokenize(content)
    
        for word in tokenize(content):
            pipeline.zadd('idx:' + word, id, 0)
    
    

    Index the words for the ad.

        rvalue = TO_ECPM[type](
            1000, AVERAGE_PER_1K.get(type, 1), value)
    

    We’ll keep a dictionary that stores the average number of clicks or actions per 1000 views on our network, for estimating the performance of new ads.

        pipeline.hset('type:', id, type)
    

    Record what type of ad this is.

        pipeline.zadd('idx:ad:value:', id, rvalue)
    

    Add the ad’s eCPM to a ZSET of all ads.

        pipeline.zadd('ad:base_value:', id, value)
    

    Add the ad’s base value to a ZSET of all ads.

        pipeline.sadd('terms:' + id, *list(words))
    

    Keep a record of the words that could be targeted for the ad.

        pipeline.execute()
    

    As shown in the listing and described in the annotations, we made three important
    additions to the listing. The first is that an ad can actually have multiple targeted locations.
    This is necessary to allow a single ad to be targeted for any one of multiple locations
    at the same time (like multiple cities, states, or countries).

    The second is that we’ll keep a dictionary that holds information about the average
    number of clicks and actions across the entire system. This lets us come up with a reasonable
    estimate on the eCPM for CPC and CPA ads before they’ve even been seen in
    the system.4

    Finally, we’ll also keep a SET of all of the terms that we can optionally target in the
    ad. I include this information as a precursor to learning about user behavior a little
    later.

    It’s now time to search for and discover ads that match an ad request.

    3 If ad copy matches page content, then the ad looks like the page and will be more likely to be clicked on than
    an ad that doesn’t look like the page content.