Spark and Redis

Spark and Redis

Analytics Made Lightning Fast

Apache Spark, a general framework for large scale processing of data, when combined with Redis, delivers accelerated, real-time analytics for uses such as time-series analysis, machine learning driven predictions and recommendations.

Advantages of Using Redis with Spark

  • Redis can accelerate Spark performance by upto 50 times, in several use cases such as spark-timeseries. The Redis-Spark connector automates this by exposing the Redis data structures and API to Spark – see the benchmark below
  • Redis provides the shared distributed memory infrastructure for Spark
  • Redis data structures allow individual elements of data to be accessed, minimizing serialization/deserialization overhead and avoiding having to transfer large chunks of data.
  • Redis Modules such as Redis-ML accelerate the Spark ML libraries for accelerated delivery of machine learning models that are stored natively in Redis, reusable across many applications and languages, easily deployed into production with 5x to 10x lower execution latencies

Databricks’ Spark platform is now integrated with Redis Lab’s Redis Cloud. The Databricks Spark notebook describing how to connect to Redis Cloud can be found here.

spark-and-redis-graph

Related Resources

Getting started with Spark on Redis
Faster Operational Analytics on Spark with Redis