
How to build a governed Agentic AI pipeline with Redpanda
Everything you need to move agentic AI initiatives to production — safely
Run your Redpanda at full throttle with our team’s top recommendations
Everyone knows Redpanda is fast, but how can you ensure you’re getting the most out of your clusters? Performance isn’t always as easy as flipping a switch — particularly if your architecture isn’t already set up for it. Much like upgrading a car, you need to tweak and tune different parts to get everything running together smoothly.
In this blog post, we run through 9 checklist items to help you experience the most streamlined real-time platform possible.
This is an obvious one, but always worth stating: infrastructure matters. Redpanda was designed to run on modern hardware, so clusters with older, slower components or that run out of resources (whether that’s CPU, memory, disk throughput, IOPS, or network bandwidth) won’t perform as well.
Recommendations:
Pro-tip: For folks who don’t want to manage infrastructure, there’s an easy button for performance: deploy on Redpanda Cloud! This is a fully managed offering, suitable for the smallest use cases (using our serverless clusters) right through to the largest deployments — without the hassle of managing a cluster yourself.
Now that you have a cluster up and running, performance is largely determined by application design and data architecture. (That one’s on you!)
For self-managed clusters with slower storage media (SSDs, spinning disks, SAN, remote storage — anything other than locally attached NVME), Redpanda’s default behaviour of performing an fsync after every message batch could be slowing you down. Ideally, you’d always deploy Redpanda on the latest and greatest hardware, but the reality of deploying on what you already have sometimes comes up, which is where write caching shines.
Rather than the gold standard of performing an fsync after every message batch, write caching holds your data in broker memory until there’s either enough data or enough time passes, and then syncs to disk in a larger write. Your producer will still only receive an acknowledgement once your data is held in memory on a majority of brokers, as long as you use acks = all. However, you’re definitely trading durability for performance, so make sure you’ve considered this.
Recommendations:
acks=all when using write caching to ensure data is in memory on multiple brokers.To handle streaming problems larger than a single broker can handle, topics are broken into partitions. But how those partitions are then used is down to your producers, rather than Redpanda.
If your application doesn’t use partitions evenly, it could start to bottleneck on skewed partitions — if not at the producers, then perhaps at the consumers, which can then lead to processing imbalances at the broker level.
Think of this as the data equivalent of Amdahl’s Law: data skew is the enemy of parallelization, limiting the benefits of scaling out by using more partitions. If 90% of your data goes through a single partition, then whether you have 10 partitions or 50 won’t really make a difference since that single overworked partition is your limiting factor.
Recommendations:
Read our guide to partition strategies
Imagine you work at a widget manufacturer and your boss asks you to send 10,000 widgets to a customer. Would you prefer to send them in 10,000 individually wrapped packages, or simply send them all in one big box?
This is the essence of batching. Rather than sending messages one by one, collating them into batches first can make things much more efficient. If your producers aren’t batching today, it’s likely they’re not as efficient as they could be. Batching does mean intentionally introducing latency into the produce pipeline, but it’s often a worthy tradeoff and can lead to lower latency overall since the broker is more efficient.
Recommendations:
linger.ms and max batch size to get your producers batching at their best.Read our blog post series on batch tuning
When building out applications, many folks get a data pipeline up and running and leave it alone. But while the default settings of a consumer are a reasonable starting point, one size doesn’t necessarily fit all. Most consumers will have a preference for either low latency or high throughput, and explicitly tuning the configuration towards that preference can have a huge impact on performance.
Recommendations:
Remember the good ol’ days of typing on your computer and having to press “Save” every few minutes in case everything crashed? Now imagine pressing “Save” after every <click!> single <click!> word <click!>. You’d take forever getting anything finished.
When consuming a topic, committing your consumer group offsets is exactly like pressing the save button. You record where you’ve read to, and just like that save, each commit takes time and resources. If you commit too frequently, your consumer will be less efficient, but it can also start to impact the broker as your consume workload gradually transforms (somewhat unknowingly) into a consume AND produce workload, since each read is accompanied by a commit write.
Many folks try to commit excessively often to minimize re-reads during an application restart. While that initially sounds plausible, re-reading some amount of data occasionally is expected for most streaming applications, so if your application already has to cope with re-reading a few milliseconds of messages, it can probably cope with a few seconds worth.
Recommendations:
auto.commit.interval.ms to a reasonable value. Generally, one second or higher; the default is 5 seconds. Low milliseconds is right out!Producers spend their days sending data, which Redpanda dutifully writes to NVME devices and sends it over the network to other brokers to do the same. Consumers then send requests for data (via the network), so Redpanda retrieves it (from memory or NVMe) and sends it back over the network. Finally, consumers send in their commits. That’s a lot of data transfers.
Each of those transfers takes place on a medium (such as a network or disk) that ultimately has a fixed capacity. For better efficiency and to send data more quickly, the only trick we have is to compress it — making it smaller and therefore quicker to send. If you can compress messages at a ratio of 5:1, you can reduce what you would have sent by 80%, which helps every stage of the data lifecycle (ingestion, storage, and retrieval).
There are many choices of compression codecs. Some will compress extremely well, but also require a significant amount of CPU time and memory. Others will compress more moderately, but use far fewer resources. A classic tradeoff.
As long as producers compress the data and consumers decompress it, the choice of codec only affects the clients themselves. While it’s possible to configure Redpanda to compress batches on your behalf, it’s best practice for clients to do this work themselves.
Recommendations:
Use ZSTD or LZ4 for a good balance between compression ratio and CPU time if compression is essential.
One of the more interesting features of Redpanda is compaction, which allows older values for a given key to be dropped, keeping only the latest value for a key. This is often used when a topic holds change messages that need to be replayed in the event of a service restart. Replaying intermediate values has no benefit, so they can be removed, improving service start-up time.
The compaction process runs in the broker and is actually the only use case where the broker reads message-level details from a topic. Usually, Redpanda treats the data as opaque bytes that need to be sent without reading them in detail. It gets interesting when you combine compaction with compression.
We usually recommend that compression takes place in clients (see above) for performance reasons, but when compacting, that’s no longer an option. This is because both the read and write portions of the compaction process will use additional CPU to decompress and recompress the data.
As a result, combining compression (particularly with CPU-intensive codecs) with compaction can lead to significant CPU utilization. Again, this is a classic trade-off between space utilization and CPU time.
Recommendations:
See our blog on implementing a last value cache using WASM
Another great feature of Redpanda is the ability to use object storage for storing older topic data. While this is primarily discussed as a way to store more data in a topic than you have local space for, there are also performance benefits.
Decommissioning and recommissioning a broker can take time, as the data needs to be replicated away from the broker before it goes offline or re-replicated towards a new broker before it can start up and fully participate in the cluster. When tiered storage is in use, decommissioning and recommissioning can both be sped up by orders of magnitude, since a copy of the data already exists out in the object store. This means only the most recent data (that is yet to be written to tiered storage) needs to be moved to or from a broker.
Recommendations:
Check out our documentation on fast decommission and recommission
Redpanda is a highly optimized message broker that delivers incredible performance due to its unique design advantages. However, to achieve the best results, it requires a solid combination of infrastructure, data architecture, and application design. This blog post outlined a handy checklist to help you get the most out of your clusters.
If you’re running Redpanda today and need to talk about performance, come chat with us in the Redpanda Community on Slack — we’re a friendly bunch.
Chat with our team, ask industry experts, and meet fellow data streaming enthusiasts.
Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.