ZooKeeper to KRaft migration: a brief overview and a simpler alternative

A summary of the process and an easier path for those who want to skip the struggle

By
on
July 30, 2024

In an Apache Kafka® cluster, Apache ZooKeeper™ is a centralized service that enables Kafka brokers, producers, and consumers to form a coherent distributed system. It maintains the cluster metadata, elects leaders, and facilitates coordination among various Kafka components.

However, the Apache Kafka community began deprecating and removing ZooKeeper dependency around 2019 with KIP-500 (Kafka Improvement Proposal 500). Apache Kafka 4.0, scheduled for release in 2024, is planned to completely remove ZooKeeper and make the KRaft mode (ZooKeeper-less mode) the default.

The Kafka community has since outlined a migration path to help users transition their Kafka cluster from ZooKeeper mode to KRaft mode. However, the path is “dark and full of terrors.” (No really, it’s complex and peppered with limitations.)

In this post, we walk down the ZooKeeper to KRaft migration path so you can fully understand what’s involved. If you think you might want to take a different road, we propose a vastly simpler migration path to a much simpler streaming data platform.

Let’s get into it.

What is KRaft and why migrate to it?

KRaft (Kafka Raft metadata mode) is a new mode introduced in Kafka to manage the Kafka metadata. It removes the dependency on ZooKeeper and makes Kafka a more self-contained system. Instead of using ZooKeeper for handling metadata, KRaft uses the Raft consensus protocol directly within Kafka.

In a nutshell, removing the ZooKeeper dependency simplifies the overall infrastructure design, improves scalability and performance, reduces operational complexity, and allows Kafka to leverage its own architecture more efficiently for metadata management.

ZooKeeper to KRaft migration: an overview

To make sure we’re all on the same page, let’s briefly go over what “migrating from ZooKeeper to KRaft” actually means.

In a Kafka cluster, brokers in ZooKeeper mode (ZK mode) store their metadata in Apache ZooKeeper. This is the old mode of handling metadata. Brokers in KRaft mode store their metadata in a KRaft quorum, which is the new and improved mode of handling metadata.

So, migration is the process of moving cluster metadata from ZooKeeper into a KRaft quorum.

Unfortunately, this migration is not a one-off task. It's a phased approach involving several critical stages. Each stage must be carefully planned, executed, and tested.

In this post, we break the migration process into several stages and walk you through each one, detailing how it is supposed to look at a high level. Keep in mind that this is not a detailed migration guide. For an in-depth guide, check the official ZooKeeper to KRaft migration documentation.

1. Prerequisites, planning, and preparation

Migrating from ZooKeeper to KRaft mode is a significant change. It requires careful planning and preparation beforehand to minimize the risk of potential failures.

First, ensure you’re using a Kafka version that supports KRaft mode, which means Apache Kafka 2.8.0 or later. You don’t want to lose any data in case of a migration failure, so take a full backup of your current ZooKeeper and Kafka cluster data. Moreover, document the number of brokers, topics, partitions, and any custom configurations.

For dry runs, it’s recommended that you set up a testing environment that mirrors your production environment. For detailed instructions, review the official Kafka documentation and KRaft migration guidelines and make several trial migrations in the test environment. This will build your confidence and help you catch issues quickly.

Finally, decide whether to perform a rolling migration or a full cluster restart. Schedule a maintenance window for minimal disruption.

At this point, your Kafka brokers should be operating in ZooKeeper mode and connected to the ZooKeeper ensemble, which is used to store metadata.

Kafka cluster is configured in the Zookeeper mode
Kafka cluster is configured in the Zookeeper mode

2. Deploying the KRaft Controller Quorum

We begin the migration by deploying the KRaft controller quorum. From now on, these controllers will maintain the cluster metadata in KRaft mode.

How many KRaft controller nodes would you need? It will typically match the number of ZooKeeper nodes currently running.

We configure controller nodes with the flag zookeeper.metadata.migration.enable=true, indicating the intention to migrate the metadata from ZooKeeper to KRaft mode. When this flag is set to true, KRaft controllers will start, form a quorum, elect a leader, and wait for brokers to register, effectively starting the migration process.

KRaft controllers form a quorum, elect a leader, and wait for broker connections
KRaft controllers form a quorum, elect a leader, and wait for broker connections

3. Enabling brokers for migration

With the KRaft controller nodes set up and waiting for the brokers, the next step is to configure the brokers for migration mode.

We update the broker configurations by adding the KRaft controller quorum connection details first. Then we set the zookeeper.metadata.migration.enable=true flag in each broker to enable the metadata migration.

After brokers are updated, perform a rolling restart of the brokers one by one to apply the configuration changes. As brokers restart, they will register with the KRaft controller quorum, and the migration will begin.

At this point, the KRaft controller leader copies all metadata from ZooKeeper to the __cluster_metadata topic.

With the brokers connecting to KRaft controllers, it starts the migration of metadata
With the brokers connecting to KRaft controllers, it starts the migration of metadata

4. Restarting brokers in KRaft mode

While the migration is in progress, you can check the current status of the process by reading the JMX MBean attribute kafka.controller:type=KafkaController,name=ZkMigrationState. When the migration is complete, the metric value will change to MIGRATION_COMPLETED.

When the migration is finished, the brokers are still running in ZooKeeper mode, and the cluster is working in a “dual-write” state. That means the metadata updates are still copied to ZooKeeper while KRaft controllers handle them.

At this stage, you no longer require ZooKeeper in the cluster. You can eliminate the ZooKeeper dependency by updating the broker configurations, removing the ZooKeeper connection details, and disabling the migration flag. Perform another rolling restart of the brokers. They will now run in full KRaft mode without ZooKeeper.

Metadata is still copied to ZooKeeper
Metadata is still copied to ZooKeeper

5. Restarting KRaft controllers

It’s the final countdown. Now you’ll remove the ZooKeeper connection details from KRaft controllers and disable the migration flag.

Perform a rolling restart of the KRaft controllers. The cluster now runs entirely in KRaft mode without ZooKeeper.

ZooKeeper can be decommissioned at this stage
ZooKeeper can be decommissioned at this stage

6. Migration verification and post-migration activities

Lastly, ensure the cluster stability by running a few post-migration tests.

For example, you can start by producing and consuming messages to topics followed by topic creation, deletion, and other administrative tasks. While doing so, monitor logs and metrics to confirm there are no issues.

Once the cluster is stable and you’re confident in the KRaft setup, decommission the ZooKeeper nodes. Finally, update all relevant documentation with the new architecture and configuration details.

Seems simple enough. What’s the problem?

You just walked through our super-simplified take on the migration process (compared to the rather dense version in the official guide). However, beneath that simplicity is the pressure of getting it right in one go. There are several caveats to keep in mind.

  • The complexity of the end-to-end process. The migration guide targets Kafka clusters on version 2.8.0 and above. In reality, most users are still running older versions of Kafka and ZooKeeper. In fact, the 2.8.0 migration requires you to have the latest version of ZooKeeper installed! Even the 2.8.0 migration requires three full cluster restarts. In practice, users might run into unexpected problems after these restarts. The complexity looms when there’s no way of going back. For example, while the migration is in place, you can’t change the metadata version as it might break the cluster. To make things worse, if something happens after the migration, you can’t revert to ZooKeeper mode: this is a one-way operation. Additionally, some features are missing in KRaft mode. So, if you’re using any of those features, you can’t migrate to KRaft yet.
  • No automation scripts or playbooks are available for migration yet. While self-hosted open-source users absorb the pain of manual migration, would fully managed Kafka platforms offer full migration assistance? It appears that AWS MSK doesn’t support the ZooKeeper to KRaft migration, as KRaft mode is only available for newly created clusters. You cannot switch metadata modes once the cluster is created, which will be a massive headache for MSK users.
  • Climbing costs are also a factor. While the KRaft mode dropped ZooKeeper, it requires dedicated Controller nodes for metadata management, which you must maintain over time. This doesn’t exactly reduce infrastructure spending.

At this point, you might be thinking, “Okay, I’ll bite. What’s the simpler alternative without the hassle of switching to KRaft mode or dealing with the cost and complexity of Kafka?"

Glad you asked! Meet Redpanda.

Redpanda: a "pawsitive" advantage for seamless streaming data

Redpanda is a unified streaming data platform built in C++ and designed as a simpler, more performant, and cost-efficient alternative to Kafka.

Unlike Kafka, Redpanda is completely free from external dependencies, like ZooKeeper and JVM. Instead, Redpanda implements the Raft consensus algorithm directly within its own architecture. This implementation handles coordination and leader election among the nodes in a Redpanda cluster, ensuring that data is consistently replicated across all nodes. By eliminating the dependency on ZooKeeper, Redpanda significantly simplifies and strengthens its architecture. In addition, Redpanda comes packaged with all the tools a developer needs to build powerful streaming data applications, like its own CLI, dev-friendly Redpanda Console, and over 200 pre-built connectors that you can plug and play in the blink of an eye.

The bottom line is that while Kafka is a powerful and mature platform, Redpanda offers a more streamlined, efficient, and dev-friendly end-to-end alternative that can make your job easier and your streaming workloads ridiculously efficient.

If that sounds like a sweet deal, then take a look at our blog on migrating from Kafka to Redpanda. It provides step-by-step instructions and a helpful pre-migration checklist to set you up for a smooth transition. For a detailed guide and common questions, download our free migration guide.

If you’re still on the fence, check out these resources:

Graphic for downloading streaming data report
Building a crypto data hub with Rust
HG King
&
Daniel Honig
&
&
August 20, 2024
Text Link
BigQuery to Redpanda: continuous queries for real-time data integration
Praseed Balakrishnan
&
Jobin George
&
&
August 6, 2024
Text Link
Bridging the data gap: an architecture for real-time user-facing analytics
Dunith Dhanushka
&
&
&
July 23, 2024
Text Link