EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

7.4.2 Approaching the problem like search

In section 7.3.3, we used SETs and ZSETs as holders for additive bonuses for optional
targeting parameters. If we’re careful, we can do the same thing for groups of
required targeting parameters.

Rather than talk about jobs with skills, we need to flip the problem around like we
did with the other search problems described in this chapter. We start with one SET
per skill, which stores all of the jobs that require that skill. In a required skills ZSET, we
store the total number of skills that a job requires. The code that SETs up our index
looks like the next listing.

Listing 7.18A function for indexing jobs based on the required skills
def index_job(conn, job_id, skills):
    pipeline = conn.pipeline(True)
    for skill in skills:
        pipeline.sadd('idx:skill:' + skill, job_id)

Add the job ID to all appropriate skill SETs.

    pipeline.zadd('idx:jobs:req', job_id, len(set(skills)))

Add the total required skill count to the required skills ZSET.

    pipeline.execute()

This indexing function should remind you of the text indexing function we used in
section 7.1. The only major difference is that we’re providing index_job() with pretokenized
skills, and we’re adding a member to a ZSET that keeps a record of the number
of skills that each job requires.

To perform a search for jobs that a candidate has all of the skills for, we need to
approach the search like we did with the bonuses to ad targeting in section 7.3.3.
More specifically, we’ll perform a ZUNIONSTOREoperation over skill SETs to calculate a
total score for each job. This score represents how many skills the candidate has for
each of the jobs.

Because we have a ZSET with the total number of skills required, we can then perform
a ZINTERSTORE operation between the candidate’s ZSET and the required skills
ZSET with weights -1 and 1, respectively. Any job ID with a score equal to 0 in that final
result ZSET is a job that the candidate has all of the required skills for. The code for
implementing the search operation is shown in the following listing.

Listing 7.19Find all jobs that a candidate is qualified for
def find_jobs(conn, candidate_skills):
    skills = {}
    for skill in set(candidate_skills):
        skills['skill:' + skill] = 1

Set up the dictionary for scoring the jobs.

    job_scores = zunion(conn, skills)

Calculate the scores for each of the jobs.

    final_result = zintersect(
        conn, {job_scores:-1, 'jobs:req':1})

Calculate how many more skills the job requires than the candidate has.

    return conn.zrangebyscore('idx:' + final_result, 0, 0)

Return the jobs that the candidate has the skills for.

Again, we first find the scores for each job. After we have the scores for each job, we
subtract each job score from the total score necessary to match. In that final result,
any job with a ZSET score of 0 is a job that the candidate has all of the skills for.

Depending on the number of jobs and searches that are being performed, our jobsearch
system may or may not perform as fast as we need it to, especially with large
numbers of jobs or searches. But if we apply sharding techniques that we’ll discuss in
chapter 9, we can break the large calculations into smaller pieces and calculate partial
results bit by bit. Alternatively, if we first find the SET of jobs in a location to search for
jobs, we could perform the same kind of optimization that we performed with ad targeting
in section 7.3.3, which could greatly improve job-search performance.

Exercise: Levels of experience

A natural extension to the simple required skills listing is an understanding that skill
levels vary from beginner to intermediate, to expert, and beyond. Can you come up
with a method using additional SETs to offer the ability, for example, for someone
who has as intermediate level in a skill to find jobs that require either beginner or
intermediate-level candidates?

Exercise: Years of experience

Levels of expertise can be useful, but another way to look at the amount of experience
someone has is the number of years they’ve used it. Can you build an alternate
version that supports handling arbitrary numbers of years of experience?