EBOOK – REDIS IN ACTION

This book covers the use of Redis, an in-memory database/data structure server.

open all | close all

9.3.1 What location information should we store?

When I mentioned locations, you were probably wondering what I meant. Well, we
could store a variety of different types of locations. With 1 byte, we could store countrylevel
information for the world. With 2 bytes, we could store region/state-level information.
With 3 bytes, we could store regional postal codes for almost every country. And
with 4 bytes, we could store latitude/longitude information down to within about 2
meters or 6 feet.

Which level of precision to use will depend on our given use case. For the sake of
simplicity, we’ll start with just 2 bytes for region/state-level information for countries
around the world. As a starter, listing 9.13 shows some base data for ISO3 country
codes around the world, as well as state/province information for the United States
and Canada.

Listing 9.13Base location tables we can expand as necessary
COUNTRIES = '''
ABW AFG AGO AIA ALA ALB AND ARE ARG ARM ASM ATA ATF ATG AUS AUT AZE BDI
BEL BEN BES BFA BGD BGR BHR BHS BIH BLM BLR BLZ BMU BOL BRA BRB BRN BTN
BVT BWA CAF CAN CCK CHE CHL CHN CIV CMR COD COG COK COL COM CPV CRI CUB
CUW CXR CYM CYP CZE DEU DJI DMA DNK DOM DZA ECU EGY ERI ESH ESP EST ETH
FIN FJI FLK FRA FRO FSM GAB GBR GEO GGY GHA GIB GIN GLP GMB GNB GNQ GRC
GRD GRL GTM GUF GUM GUY HKG HMD HND HRV HTI HUN IDN IMN IND IOT IRL IRN
IRQ ISL ISR ITA JAM JEY JOR JPN KAZ KEN KGZ KHM KIR KNA KOR KWT LAO LBN
LBR LBY LCA LIE LKA LSO LTU LUX LVA MAC MAF MAR MCO MDA MDG MDV MEX MHL
MKD MLI MLT MMR MNE MNG MNP MOZ MRT MSR MTQ MUS MWI MYS MYT NAM NCL NER
NFK NGA NIC NIU NLD NOR NPL NRU NZL OMN PAK PAN PCN PER PHL PLW PNG POL
PRI PRK PRT PRY PSE PYF QAT REU ROU RUS RWA SAU SDN SEN SGP SGS SHN SJM
SLB SLE SLV SMR SOM SPM SRB SSD STP SUR SVK SVN SWE SWZ SXM SYC SYR TCA
TCD TGO THA TJK TKL TKM TLS TON TTO TUN TUR TUV TWN TZA UGA UKR UMI URY
USA UZB VAT VCT VEN VGB VIR VNM VUT WLF WSM YEM ZAF ZMB ZWE'''.split()

A table of ISO3 country codes. Calling split() will split the string on whitespace, turning the string into a list of country codes.

STATES = {
   'CAN':'''AB BC MB NB NL NS NT NU ON PE QC SK YT'''.split(),

Province/territory information for Canada.

   'USA':'''AA AE AK AL AP AR AS AZ CA CO CT DC DE FL FM GA GU HI IA ID
IL IN KS KY LA MA MD ME MH MI MN MO MP MS MT NC ND NE NH NJ NM NV NY OH
OK OR PA PR PW RI SC SD TN TX UT VA VI VT WA WI WV WY'''.split(),

State information for the United States.

}

I introduce these tables of data initially so that if/when we’d like to add additional state,
region, territory, or province information for countries we’re interested in, the format
and method for doing so should be obvious. Looking at the data tables, we initially define
them as strings. But these strings are converted into lists by being split on whitespace by
our call to the split() method on strings without any arguments. Now that we have some
initial data, how are we going to store this information on a per-user basis?

Let’s say that we’ve determined that user 139960061 lives in California, U.S., and
we want to store this information for that user. To store the information, we first need
to pack the data into 2 bytes by first discovering the code for the United States, which
we can calculate by finding the index of the United States’ ISO3 country code in our
COUNTRIES list. Similarly, if we have state information for a user, and we also have state
information in our tables, we can calculate the code for the state by finding its index
in the table. The next listing shows the function for turning country/state information
into a 2-byte code.

Listing 9.14ISO3 country codes
def get_code(country, state):
   cindex = bisect.bisect_left(COUNTRIES, country)

Find the offset for the country.

   if cindex > len(COUNTRIES) or COUNTRIES[cindex] != country:
      cindex = -1

If the country isn’t found, then set its index to be -1.

   cindex += 1

Because uninitialized data in Redis will return as nulls, we want “not found” to be 0 and the first country to be 1.

   sindex = -1
   if state and country in STATES:
      states = STATES[country]

Pull the state information for the country, if it’s available.

      sindex = bisect.bisect_left(states, state)

Find the offset for the state.

      if sindex > len(states) or states[sindex] != state:
         sindex = -1

Handle not-found states like we did countries.

   sindex += 1)

The chr() function will turn an integer value of 0..255 into the ASCII character with that same value.

   return chr(cindex) + chr(sindex)

Keep not-found states at 0 and found states > 0.

Location code calculation isn’t that interesting or difficult; it’s primarily a matter of
finding offsets in lookup tables, and then dealing with “not found” data. Next, let’s
talk about actually storing our packed location data.