KubeCon 2019: Building the Data Layer on Kubernetes

Aaron Sun by Aaron Sun

If you were in San Diego for KubeCon + CloudNativeCon North America 2019, you probably already heard about the latest developments in the cloud-native ecosystem and the major updates coming in the next version of Kubernetes. I was in awe of the energy and excitement at the conference. There was tons to see at the sponsor showcase, breakout sessions, and keynotes that demonstrated just how far the community has come since the inception of the Cloud Native Computing Foundation (CNCF), and where it can continue to grow.

I was particularly interested in understanding how practitioners are approaching the data layer when they’re running applications on Kubernetes. What’s the state of Kubernetes in this area, and how are people tackling Kubernetes-related data challenges? 

Kubernetes is fueling digital transformation

As cliche as it sounds, digital transformation is happening everywhere, and organizations are in search of ways to deliver instant, connected experiences to customers faster than ever before. They’re modernizing their applications by adopting new architectural patterns, new technologies, and new organizational practices.

As part of this process, practitioners have turned to containers and Kubernetes for streamlining application deployments. Kubernetes is a system that automates the deployment, scheduling, and management of containerized applications via declarative configuration and an API, with a number of benefits that make it especially attractive for running these applications: 

  1. Increased operational efficiency: Kubernetes is designed to automatically perform many operational tasks that would have otherwise required manual effort, including load balancing, storage orchestration, container rollouts, and configuration management. 
  2. Systems resilience: Kubernetes is designed to be a self-healing system, and will automatically restart or replace application containers that become unresponsive or experience failures. 
  3. Flexibility and extensibility: The Kubernetes API makes it incredibly easy for external systems to interact with Kubernetes, and lets developers write software that extends Kubernetes even further.

The impact of Kubernetes on the data layer

Getting the data layer right is critical to building a cloud-native application. While a monolithic application might have a single database, developers now have to carefully choose the right data model for multiple services, manage separate database instances for each service, and minimize excessive database calls so that applications continue to perform quickly. 

Kubernetes adds even more complexity because it forces people to think about storage and state management in an environment where containers are automatically started if an application needs to scale up, or terminated when they experience failures. 

While this is perfectly acceptable for stateless applications, stateful workloads are usually long-lived and often require data to be passed across a series of discrete steps. For example, adding shards to a database cluster or upgrading versions requires specific knowledge that Kubernetes doesn’t have out of the box. Running stateful workloads in Kubernetes traditionally required a human operator to manually configure and deploy lots of different Kubernetes objects (StatefulSets, Persistent Volumes, Persistent Volume Claims, etc) in order to make sure that Kubernetes followed the right steps to maintain state, or persist data correctly in the event of a failure. 

How enterprises are solving data challenges with Kubernetes

PlanetScale shares their journey to build a database as a service on Kubernetes using Operators

People are just beginning to figure out stateful services and storage in Kubernetes. Many Kubernetes experts advise against running production stateful workloads in Kubernetes unless you really know what you’re doing. Most organizations start by only putting their stateless workloads in Kubernetes, or disabling horizontal pod auto-scaling. 

Organizations with significant engineering resources and experience running Kubernetes often build their own services designed to orchestrate stateful workloads using a combination of Kubernetes controllers and related software, like etcd. Lyft, for example, built Flyte, an open-source platform for orchestrating its data science and machine learning workflows. 

Finally, a growing number of individuals and software vendors are writing Kubernetes Operators for databases and other stateful workloads, and there were multiple sessions at KubeCon covering how to write and use them. The proliferation of Kubernetes Operators is a sign that it will become even easier to deploy these workloads and automate their lifecycles using Kubernetes. 

Exciting times ahead

A crowd gathers at the Redis Labs booth

All signs pointed to opportunity at KubeCon, and I look forward to seeing how the Kubernetes ecosystem continues to evolve. In another blog, I’ll talk about some of the most common Kubernetes-related questions we received at the Redis Labs KubeCon booth, and how you can use Redis Enterprise to solve your data challenges on Kubernetes.