The Kafka cloud—options and best practices

Managed Kafka

Self-managed Kafka means you are responsible for managing Kafka cluster deployment and configurations along with underlying infrastructure. You are also responsible for managing configurations for varying workloads on brokers and partitions, which become challenging at scale.

In contrast, managed Kafka solutions allow you to focus on building data pipelines instead of getting bogged down with the operational details of running Kafka. They provide varying degrees of support—from managing underlying infrastructure to software updates and cluster maintenance.

This article explores how managed Kafka services assist organizations in leveraging Kafka functionalities while reducing the need to manage underlying clusters.

Summary of key managed Kafka benefits

ConceptDescription
Simplified deployment and managementAutomates setting up and maintaining Kafka cluster, reducing manual work. You can create managed clusters in just a few clicks.
High availability and reliabilityEnsures Kafka cluster continues operating even during outages.
Management costsManaged Kafka solutions provide automated tools for provisioning, upgrades, and patching to significantly reduce management costs.
Security and complianceProvides built-in security measures and adheres to industry regulations for data protection.
Cost efficiencyOptimizes cost with pay-as-you-go pricing based on resource usage.
Operational focusManaged services handle the complexities of running Kafka, allowing your team to concentrate on building data pipelines and delivering business value.

Why choose managed Kafka over self-managed Kafka?

Self-managed Kafka gives control over infrastructure but comes at the cost of significant operational overhead. Managed Kafka addresses management challenges and helps engineers focus more on Kafka functionalities.

Managed Kafka services automate several aspects of Kafka cluster management

When considering whether to opt for a managed Kafka solution or to self-manage a Kafka cluster, the decision often hinges on several critical factors.

Simplified deployment and management

Managed Kafka lets you provide the desired configurations and launch your cluster within a few minutes. Additionally, the service provider takes care of ongoing maintenance, such as patching software and monitoring cluster health. Your team can focus on building high-quality data pipelines.

High availability and reliability

In a self-managed cluster, you must configure replication across brokers, set up failover mechanisms, and continuously monitor the cluster to provide high availability and minimal downtime. In contrast, managed Kafka services provide built-in redundancy by replicating data to multiple regions. In case of any broker failure, they automatically route traffic to another healthy broker, ensuring minimal downtime and data loss. Cloud providers also offer continuous monitoring to identify potential issues that could disrupt business operations.

Management costs

Self-managed Kafka clusters require significant operational overhead. This includes managing the initial setup, regular software upgrades, security patches, and scaling the infrastructure as data volume fluctuates. These tasks demand dedicated resources and expertise, often leading to higher costs and potential risks, especially in complex, large-scale environments.

Managed Kafka solutions significantly reduce these management costs. They provide automated tools for provisioning, upgrades, and patching. Your Kafka cluster is always up-to-date without manual intervention. Additionally, managed services offer auto-scaling features that adjust resources based on real-time demand, optimizing both performance and cost efficiency.

Security and compliance

Security is a major concern when handling sensitive data. Self-managed clusters require you to implement and regularly update security measures such as encryption, authentication, and authorization. In contrast, managed Kafka services come with pre-configured security settings. They ensure that your data is protected by default. They also comply with industry standards like GDPR, HIPAA, and SOC 2, simplifying regulatory compliance.

Cost efficiency

The pay-as-you-go pricing model of managed Kafka services helps organizations manage costs more effectively. Self-managed clusters often require over-provisioning of resources to accommodate peak loads. In contrast, managed services automatically scale resources up or down based on actual usage. This flexibility leads to significant cost savings, especially in environments with variable workloads.

Operational focus

Finally, one of the biggest advantages of using managed Kafka services is the ability to shift focus from infrastructure management to application development. Managed services handle the complexities of running Kafka, allowing your team to concentrate on building data pipelines and delivering business value.
By offloading the operational burden to a managed service provider, you can reduce the direct costs associated with running Kafka and the indirect costs related to potential downtime or performance issues.

Examples of managed Kafka services

AWS, Azure, and GCP all offer managed Kafka services. Our Kafka cloud guide covers each service in detail and provides tutorials on getting started. We give an overview below.

AWS Cloud

AWS Managed Streaming for Apache Kafka (Amazon MSK) is the most popular managed Kafka service running in the AWS cloud. It enables your applications to run with minimal downtime; it automatically detects a broker failure and replaces it with a healthy or new broker. MSK integrates with other AWS services like S3 for storage, IAM for access management, and Lambda for serverless processing.

Steps to get started:

  1. Sign in to the AWS console and navigate to the Amazon MSK console.
  2. Select the “Create Cluster” button and choose “Quick Create” as the creation option, which lets you create a cluster with default settings.
  3. For the cluster name, add a descriptive name for your cluster (this can’t be changed later.
  4. Configure VPC, subnets, and security groups to secure your cluster.
  5. Create an IAM role with permissions to create topics on the cluster and to send data to those topics.
  6. Set up a client machine to create a topic that produces and consumes data. Launch an EC2 instance within the same VPC as the MSK cluster.
  7. Edit the security group associated with the MSK cluster to accept inbound traffic from the client machine you set up in the previous step.
  8. Connect to the client machine, install Kafka, and then create a topic.
  9. Run the commands to start a console Producer and send messages to the topic. Then run the commands to start a console Consumer and receive messages from the topic.

Azure Cloud

HDInsights Kafka is the managed Kafka service within the Azure cloud, providing a simplified configuration process. HDInsight offers only an upward scalability option by allowing you to change the number of worked nodes after cluster creation. It doesn’t allow you to decrease the number of brokers within a cluster.

Steps to get started:

  1. Sign in to the Azure portal and select the “Create a resource” option.
  2. Navigate to Analytics > Azure HDInsight to the “Create HDInsight cluster” page.
  3. Provide project details, including the subscription and the resource group within which you want to create the cluster.
  4. Provide the name of your cluster and the region where it should be deployed.
  5. Choose “Kafka” as the cluster type and configure cluster credentials.
  6. In the next step, provide storage details like storage type, selection method, and primary storage account.
  7. For this quickstart tutorial, leave the security settings as default.
  8. In the next step, choose the number of nodes based on your availability requirements and budget.
  9. Select the “Review + Create” tab and create the cluster.
  10. After deployment, you can connect your producers & consumers via the cluster’s public or private endpoints.

Google Cloud

Google Cloud's Managed Service for Apache Kafka offers tools for cluster creation and automatic scaling within the GCP ecosystem, making it easier to set up Kafka for event-driven systems and streaming pipelines. However, while these services simplify initial deployment and scaling, they do not fully manage ongoing operational tasks. Users are still responsible for day-2 operations, including maintenance, performance optimization, and troubleshooting. This means that while managed services like Google Cloud and AWS MSK reduce some operational overhead, they do not eliminate the need for in-house expertise to handle the full lifecycle of Kafka management.

Steps to get started:

  1. Go to your Google Cloud Console and navigate to the “Clusters” page.
  2. Select “Create” and enter a cluster name.
  3. Provide the region where you want the cluster to be deployed.
  4. Configure the number of vCPUs and the amount of memory for your Kafka setup.
  5. In network configurations, provide the project, network, and subnet details.
  6. Choose an encryption method, either Google-managed keys or customer-managed keys.
  7. Click the “Create” button to create the cluster.
  8. Once the cluster is deployed, you can monitor it or set up a topic on it to produce and consume messages.

Some more managed Kafka options

Managed Kafka in the public cloud is not fully managed, as the name implies. You get infrastructure management, and they do support cluster provisioning, but beyond that, you are on your own. You are still responsible for the operational aspects of Kafka, such as configuration tuning, topic management, and broker performance monitoring.

For more managed options, consider the following.

Instaclustr

Instaclustr is an open-source solution that lets you deploy clusters in any cloud of choice. The service includes automated provisioning, monitoring, and scaling, with a focus on maintaining high performance. You get dedicated support for a per-node support fee. Getting started with Instacluster is easy, but it is also not fully managed. You don't get ongoing maintenance, upgrades, or configuration support.

Aiven

Aiven is a third-party solution that lets you deploy your Kafka clusters on any cloud of choice. You can migrate between clouds, deploy new nodes, or set up clusters independent of underlying cloud infrastructure. Instead, you can manage your Kafka between clouds from a central dashboard. The downside is that performance drops significantly on Aiven as your clusters grow.

Confluent Cloud

Confluent Cloud provides a managed Kafka service for provisioning and managing Kafka clusters. It includes schema registry, stream processing(KSQL), and connectors for various data sources and sinks. The downside is that Confluent Cloud is too expensive for many use cases. Costs add up quickly at scale.

Redpanda Cloud

While various managed Kafka services exist, Redpanda Cloud offers a compelling alternative. Redpanda is a drop-in Kafka replacement that can reduce costs by 6x while improving the performance of your Kafka workloads. Redplanda Cloud offers a range of deployment models:

  • BYOC clusters are hosted on your cloud but are fully managed by Redpanda.
  • Dedicated clusters - hosted on Redpanda cloud infrastructure and managed by Redpanda
  • Serverless clusters - hosted on shared infrastructure but securely isolated and fully managed by Redpanda

Serverless clusters are fully managed throughout their entire lifecycle. Redpanda's Bring Your Own Cloud (BYOC) model allows companies to maintain full control over their data while benefiting from a fully managed service. This approach provides an extra layer of security and compliance, ensuring that sensitive data never leaves the organization’s controlled environment.

Start your managed Kafka journey with Redpanda Cloud today to unlock the full potential of your data pipelines.

Best practices with managed Kafka

We recommend the following best practices if you are choosing managed Kafka services in the public cloud.

Storage

Set up suitable data retention policies for your topics to manage storage costs. Leverage tiered storage options so non-critical data is archived in cheaper storage service. Set up automatic deletion of idle topics to reduce their storage cost. You can also enable compression for your Kafka topic to reduce the size of messages being sent. This helps lower storage costs and improve throughput.

Security

Use identity and access management services to control access to your Kafka resources. Set up your cluster within a private network to protect your resources. Implement a regular backup mechanism for your data to ensure no or minimal data loss during any outage.

Availability

Configure replication instances in multiple availability zones or regions. This way, if one region goes down, your Kafka cluster can continue services from another region. Integrate with cloud provider monitoring services to monitor Kafka metrics such as throughput, latency, and error rates.

Last thoughts

Managed Kafka services empower organizations to leverage Kafka's real-time data processing abilities without worrying about managing the underlying infrastructure. You can focus on core business operations with available features like scalability, security, high availability, monitoring, and integration with other cloud services.

Chapters