Machine Learning with Datatron and Redis

Productionizing Machine Learning with Datatron and Redis

by Harish Doddi

Machine Learning is at the core of many modern systems today. Delivering more precise predictions and recommendations can make the difference between gaining or losing new customers and revenue.

Delivering real-time predictions for mission-critical applications with low latency while also performing online learning, composite modeling and constant experimentation requires a powerful infrastructure. Building this pipeline on your own can be extremely challenging, but combining the Datatron and Redis Labs platforms simplifies powerful online learning and productionizing machine learning.

Datatron Platform

Data scientists use Datatron Enterprise to easily develop and manage Machine Learning (ML) and Deep Learning models and get them into production faster. Data engineering, data science and dev-ops teams can take advantage of both historical and real-time streaming data to collaborate and iterate models faster in production.

figure-1

Datatron utilizes Redis Enterprise to scale machine learning pipelines in the following ways:

  1. Redis Enterprise as a data source: represented by 1 in figure-1
  2. Redis Enterprise as a caching data layer: represented by 2 in figure-1
  3. Redis Enterprise as a streaming data layer: represented by 3 in figure-1
  4. Redis Enterprise as an ML datatype: represented by 4 in figure-1

Redis Enterprise provides improved availability, scaling and cost efficiency over open source Redis, while enabling faster data ingestion for both offline and online learning as well as for serving dynamic recommendations through machine learning models.

When combined, Datatron and Redis Enterprise deliver a high throughput, low-latency recommendation and prediction engine with unique online learning and model validation capabilities.

Fast Computation with Redis

There are many examples of fast data ingest requirements in delivering real-time predictions with online learning. One such example is counting events based on attributes of a geography – count events by neighbourhoods, zip code, city, state, country and do this for the last few mins, hours, days and weeks.

For example, a few Datatron customers record the number of users in a specific region who opened the app in the last 20 minutes, and store it in the real-time data pipeline every minute. The location is represented as a geohash (Redis use case #4 in figure-1 above). Different resolutions of geohashes are recorded in the system; right now they are geohashes 4, 5, 6 and 7. At a high level, a geohash uses a string to represent a rectangular region of earth. The longer the string, the smaller the rectangular region is. For example, 9q8y covers almost the whole San Francisco region. 9q8yu is Sunset and 9q8yug is part of the Golden State Park. 9q8yugn (Geohash 7) is about one street block. Using the data from the streaming service, we save all resolutions into Redis:

9q8y-ts1: {count:20}
9q8yu-ts1: {count:18}
9q8yug-ts1: {count:8}
9q8yugn-ts1: {count:2}

All of those entries auto-expire after a designated length of time (for example: 1 hour). In the above example, “ts1” is the timestamp that the entry was created.

On the client side, we have to “guess” the timestamp in order to retrieve this information, so we’ll create a batch query. Assuming the current timestamp is ts2, we will query Redis for 9q8y-(ts2 – 1 minute), 9q8y-(ts2 – 2 minute), 9q8y-(ts2 – 3 minute), 9q8y-(ts2 – 4 minute), and 9q8y-(ts2 – 5 minute). From the returned results, we pick only the closest timestamp to use.

The aggregation on a geohash can be optimized as well. For example, geohash 9q8yug can be divided into 32 Geohash 7 blocks. But we have to do the aggregation separately outside of Redis. In the above example, count(9q8yug-ts1) = sum(count(9q8yug?-ts1)) where “?” can be b,c,f,g… etc. for a total of 32 options.

Online Learning with Redis

Figure 2

Another example of how Redis accelerates Datatron is by moving computation closer to where the data is “born.” Often the machine-learning model gets trained in the data center offline and then gets shipped into the tower. The drawback with such a system is that the model doesn’t adapt to new data. In this scenario, Online learning improve drastically as learning moves closer to data. That is exactly what Redis Enterprise integrated with Datatron delivers.

To find out more about the Datatron platform and How it works with Redis, you can visit us at RedisConf 2018 in San Francisco in April. During our joint session titled “Using Redis as Real-Time Engine for Online Machine Learning,” Datatron and Redis Labs will detail how the architecture works to deliver fast and precise recommendations with advanced machine learning and AI pipelines.

 

——————————-
Guest Blog post:
Harish Doddi is Co-founder and CEO of Datatron Technologies. Previously, he held roles at Oracle; Twitter, where he worked on open source technologies, including Apache Cassandra and Apache Hadoop, and built Blobstore, Twitter’s photo storage platform; Snapchat, where he worked on the backend for Snapchat stories; and Lyft, where he worked on the surge pricing model. Harish holds a master’s degree in computer science from Stanford, where he focused on systems and databases, and an undergraduate degree in computer science from the International Institute of Information Technology in Hyderabad.