White Papers

Getting Started with Spark and Redis

Redis Labs recently published a spark-redis package for general public consumption. It is, as the name may suggest, a Redis connector for Apache Spark that provides read and write access to all of Redis’ core data structures as RDDs (Resilient Distributed Datasets, in Spark terminology).

Since Spark was introduced, it has caught developer attention as a fast and general engine for large-scale data processing, easily surpassing alternate big data frameworks in the types of analytics that could be executed on a single platform. Spark supports a cyclic data flow and in-memory computing, allowing programs to be run faster than Hadoop MapReduce. With its ease of use and support for SQL, streaming and machine learning libraries, it has ignited early interest in a wide developer community. Redis brings a shared in-memory infrastructure to Spark, allowing it process data orders of magnitude faster. Redis data structures simplify data access and processing, reducing code complexity and saving on application network and bandwidth usage. The combination of Spark and Redis fast tracks your analytics, allowing unprecedented real-time processing of really large datasets.

The whitepaper outlines the steps needed to start using Apache Spark and Redis.