
At Redpanda, one of our guiding principles is this: you can't improve what you can't see. That’s why observability is at the heart of how we support customers, design features, and roll out improvements.
Recently, we collaborated with Chess.com — the world’s largest chess website with over 200 million members. As the #1 hub for online chess, this powerhouse platform hosts more than ten million chess games every day. However, they were experiencing a significant performance issue: one of their largest data streams was consuming an excessive amount of CPU, placing pressure on their real-time infrastructure.
Here’s how we used observability to diagnose the problem, improve our product, and deliver a major performance boost — all with zero downtime.
The opening: High CPU and growing resource pressure
The customer was running a critical data stream that had grown to a massive size: tens of terabytes. Over time, it began using more and more CPU, threatening the health of their entire system.
Through our dashboards and monitoring tools, we traced the problem to one core area: compaction on compressed data — the process of cleaning up old or duplicate records in the stream. It was running inefficiently and too frequently, using far more resources than necessary.
The winning move: smarter compaction, powered by observability
Thanks to the insights from our observability platform, we were able to identify exactly where our system could be more efficient. The good news? We had just introduced improvements in Redpanda version 25.1 — including smarter controls that determine when and how compaction should run.
With the customer’s consent, we scheduled a safe and seamless upgrade. Our team closely monitored every step of the rollout, using real-time metrics to track performance, resource usage, and system health.
Checkmate: less CPU, lower latency, happier users
The results were immediate and dramatic:
- CPU usage dropped sharply across the entire system
- Latency was cut nearly in half, meaning faster data delivery
- The system became more stable with fewer spikes in usage or risk of overload
Even better, we saw that compaction was now running at the right times, on the right data, with much less effort — exactly as intended.
The Grafana chart below illustrates the significant reduction in CPU utilization on a single broker within the cluster.

Redpanda’s winning observability strategy
This wasn’t a lucky move — it was observability playing chess, not checkers. Nor is this a story about simply fixing one issue, but a reflection of how we operate at Redpanda:
- We invest deeply in observability, so we can spot issues early and solve them confidently
- We continuously improve our product, so every customer benefits from smarter defaults and better performance
- And we validate every change through real-world customer outcomes, not just benchmarks or theory
For this customer, the fix meant smoother gameplay experiences and more efficient infrastructure. For us, it was a reminder that sometimes the biggest wins come from small, smart changes — made visible through great observability.
So if you're running large-scale data streams and want better performance with fewer headaches, get in touch or check out these handy resources:
- How to set up observability for Redpanda
- Batch tuning in Redpanda for optimized performance
- Best practices for tuning and monitoring Redpanda
- How to monitor Redpanda Connect | Docs
Let's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.