
Kafka on Kubernetes: Pros and cons
Understand the risks and rewards of data streaming with Kubernetes.

"But I want to run KafkaⓇ on Kubernetes. I want to run everything on Kubernetes."
It was hard not to roll my eyes when Karl said that to me. We were sitting outside of a Starbucks near my house, enjoying the morning air before the summer heat drove us back inside. He was drinking something green and healthy, and I was drinking a black iced americano. I took another sip while I counted slowly in my mind. One...two...three...and then the words fell out of my mouth, unbidden in their cold, honest truth.
"In nearly every situation, the benefit of running Kafka on Kubernetes isn't worth the trouble. I won't go so far as to say that it's wrong, but it's not far from it."
The place where Karl works is one of the companies that still hasn't made the leap to modern infrastructure. They have a large, legacy application, and Karl was hired recently as a DevOps Manager to help them with "digital transformation" and the "move to cloud native." Like many companies on that journey, he works with people who believe and also with people who are just going along with the industry trend. Karl believes. He's a Kubernetes fanboy to the core, and hearing me throw shade on Kubernetes makes him narrow his eyes.
"Running apps on Kubernetes is better than running them any other way."
But is it?
There are two types of applications that run inside of Kubernetes: cloud native apps that are purpose-built for a microservices environment like Kubernetes, and applications that were ported to that environment. For the second group, the performance of the application varies wildly. A web application with a static frontend and a PHP backend is an easy lift. For more complex applications that have expectations about the direct availability of their processes, the story is different.
What benefits does Kubernetes provide? #
"What benefit does a container provide?" I asked Karl.
He leaned back in his chair, confident that he was going to win the debate.
"Containers let you build an application once and then run it anywhere with a compatible container engine. They eliminate issues with dependencies in a runtime environment."
I nodded, and he looked up at the patio cover above us and thought for a moment. "They isolate processes without adding significant latency over running the application directly on a host," he continued. They increase application density, allowing you to make better use of the resources of a host. That gives you better ROI for your infrastructure."
"But you can't just run Kafka in a container," he said, thinking that's what I was going to suggest. "You need Kubernetes to orchestrate it."
Kubernetes connects the container runtime interface (CRI) with the container network interface (CNI) and the container storage interface (CSI), and then it provides the plumbing and glue to turn one or more containers into an application.
An application is more than software running in a container or on a host. It's everything involved in getting user traffic to that software and back again. Software by itself doesn't do anything. It has to be installed. It has to be configured. It has to be connected to the outside world. Developers shouldn't have to care about any of that. They should just be able to push code out to a repository, and something else should deploy it and make it available.
How to model Kafka on Kubernetes #
"Look," I said. "What we do with Kubernetes isn't magic. All we've done is virtualize the way that we used to do things in hardware. Kubernetes is like a mini datacenter, where each node is a rack, and each Pod is a server. It comes with all the switching, cabling, and routing pre-installed, so when you deploy your workloads, everything just works. Or at least it's supposed to."
I grabbed a clean napkin and started drawing on it.
What Kafka looks like on real hardware
Apache Kafka (currently) requires two components: the Kafka cluster itself, and a ZooKeeperⓇ cluster to handle elections, membership, service state, configuration data, ACLs, and quotas. If we were building these on real hardware, we might have a 3-node Kafka cluster and a 3-node ZooKeeper cluster. Each Kafka broker would have its own low-latency disks, lots of RAM for caching, and a reasonable amount of CPU cores. It would also know about each ZooKeeper node, and each producer and consumer would know about each Kafka broker.
This is a crucial part of how Kafka works -- producers and consumers have a list of bootstrap addresses to connect to. When they start, they connect to one of these addresses and are then given a specific broker to connect to for the partition they're working with. If that broker goes down, they'll reconnect to the bootstrap nodes and be assigned to a different broker. The communication between Kafka and ZooKeeper is similar -- each broker will connect to one of the ZooKeeper nodes in its list, and if that node fails, the broker will connect to a different node.
This awareness and automatic healing means that you never put a load balancer in front of Kafka or between Kafka and ZooKeeper.

What Kafka looks like on Kubernetes
If we take what we've defined above and map it to Kubernetes resources, we end up with one StatefulSet for ZooKeeper and another for Kafka, each with three replicas. StatefulSets are a Kubernetes resource designed for stateful applications, and they guarantee:
- Pods will have stable, unique names and network identifiers, and stable, persistent storage that moves with the Pod when it's scheduled to any other node.
- Pods will be created sequentially (0, 1, 2...n) and terminated in reverse order (n...2, 1, 0).
- Rolling updates will be applied in termination order
- All predecessors in the StatefulSet must be Running and Ready before successive Pods are created.
- Before Kubernetes terminates a Pod, all its successors must be completely shut down.
A StatefulSet requires a headless Service to control the domain of the Pod. Where a normal Service presents an IP and load balances the Pods behind it, a headless Service returns the IPs of all its Pods in response to a DNS query. When a headless Service sits in front of a StatefulSet, Kubernetes takes this one step further and allows DNS queries for the Pod name as part of the Service domain name.
For example, imagine that we have a StatefulSet named kafka
with three replicas, running in the namespace production
. The Pods would be named kafka-0
, kafka-1
, and kafka-2
. If we put a headless Service called kafka
in front of them, then a DNS request for kafka.production.svc.cluster.local
would return all three Pod IPs. We could also query DNS for kafka-0.kafka.production.svc.cluster.local
to get the IP for kafka-0
, and so on.
If we did the same with ZooKeeper, we might end up with zk-0
through zk-2
as part of zookeeper.production.svc.cluster.local
.
Our producers and consumers would use each of the Kafka replicas in their bootstrap configuration, and each of the Kafka brokers would use the ZooKeeper replicas as well.
It's not all doom and gloom #
"The problem isn't entirely with Kubernetes, Karl. The problem is also with Kafka."
Karl looked at me and scoffed.
"No, seriously," I pressed. "What if you could replace Kafka with something else, without changing your existing code, and solve the problems that I just described?"
I had his attention.
"You're not going to fix the networking issues of exposing a broker outside of the cluster, but if your producers and consumers are located inside of the same Kubernetes cluster, we can make it into a robust solution with Redpanda."
Unlike Kafka, Redpanda communicates with RAM via direct memory access (DMA) and bypasses the filesystem cache for reads and writes. It aligns memory according to the layout of the filesystem, so very little data is flushed to the actual physical device. The resources allocated to Redpanda in the Kubernetes manifest are reserved for Redpanda and more efficient than relying on the filesystem cache. Along with its single-binary architecture that doesn't use Java or ZooKeeper, this also makes it better suited for running in a container. (Learn how to start using Redpanda in Kubernetes here.)
Redpanda's Kubernetes manifests also take brokers into and out of maintenance mode as part of the Kubernetes shutdown process, ensuring that other brokers take over as partition leaders and that the broker is drained of producers and consumers before the Pod terminates. Upon returning to service, the broker's DMA cache is primed as it catches up with the partition leaders before being eligible to become a leader itself.
This is a small part of what makes Redpanda the modern streaming data platform.
Kubernetes is an answer, not the answer #
I love Kubernetes. It's a powerful tool that solves a myriad of problems, but it isn't a panacea. Not everything that can run on Kubernetes should run on Kubernetes, and if you’re evaluating whether to run Kafka or Redpanda on Kubernetes, consider how you’re going to interact with it from outside of the cluster. You may have to take steps to tell the brokers what their advertised IP is, and you may have to resort to manual configuration of the network stack to preserve the behavior that exists in a non-Kubernetes deployment.
Ironically, if you run Kafka or Redpanda outside of Kubernetes, you can still use it with producers and consumers that run inside of a Kubernetes cluster. It's only the other way around that's a challenge.
Head over to our Get Started page to learn more about Redpanda and start using it for free. For more free education on Streaming Fundamentals and Getting Started with Redpanda, visit the Redpanda University. To see what other members of the Redpanda Community are building, join us on Slack!
Let's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.