How a global gaming platform cut latency and CPU load with smarter observability

Learn how Redpanda delivered a game-changing performance boost for Chess.com

June 16, 2025
Last modified on
TL;DR Takeaways:
How did Redpanda use observability to solve the performance issue at Chess.com?

Redpanda used observability to identify the exact area where their system could be more efficient. They discovered that the issue was with the compaction process on compressed data. With the insights from their observability platform, they implemented improvements introduced in Redpanda version 25.1, which included smarter controls that determine when and how compaction should run. This led to a major performance boost.

What resources does Redpanda provide for setting up observability and optimizing performance?

Redpanda provides several resources for setting up observability and optimizing performance. These include guides on how to set up observability for Redpanda, batch tuning in Redpanda for optimized performance, best practices for tuning and monitoring Redpanda, and how to monitor Redpanda Connect.

What was the performance issue faced by Chess.com and how was it identified?

Chess.com, which hosts over ten million chess games daily, was facing a significant performance issue. One of their largest data streams was consuming an excessive amount of CPU, putting pressure on their real-time infrastructure. The issue was traced to compaction on compressed data, which was running inefficiently and too frequently, using more resources than necessary. This problem was identified through Redpanda's dashboards and monitoring tools.

What were the results of the improvements made by Redpanda at Chess.com?

The improvements made by Redpanda resulted in immediate and dramatic results. CPU usage dropped sharply across the entire system, latency was cut nearly in half, and the system became more stable with fewer spikes in usage or risk of overload. Additionally, compaction was now running at the right times, on the right data, with much less effort.

Why is observability important in Redpanda's strategy?

Observability is a key part of Redpanda's strategy because it allows them to spot issues early and solve them confidently. It also enables them to continuously improve their product, ensuring every customer benefits from smarter defaults and better performance. Furthermore, it allows them to validate every change through real-world customer outcomes, not just benchmarks or theory.

Learn more at Redpanda University

At Redpanda, one of our guiding principles is this: you can't improve what you can't see. That’s why observability is at the heart of how we support customers, design features, and roll out improvements.

Recently, we collaborated with Chess.com — the world’s largest chess website with over 200 million members. As the #1 hub for online chess, this powerhouse platform hosts more than ten million chess games every day. However, they were experiencing a significant performance issue: one of their largest data streams was consuming an excessive amount of CPU, placing pressure on their real-time infrastructure.

Here’s how we used observability to diagnose the problem, improve our product, and deliver a major performance boost — all with zero downtime.

The opening: High CPU and growing resource pressure

The customer was running a critical data stream that had grown to a massive size: tens of terabytes. Over time, it began using more and more CPU, threatening the health of their entire system.

Through our dashboards and monitoring tools, we traced the problem to one core area: compaction on compressed data — the process of cleaning up old or duplicate records in the stream. It was running inefficiently and too frequently, using far more resources than necessary.

The winning move: smarter compaction, powered by observability

Thanks to the insights from our observability platform, we were able to identify exactly where our system could be more efficient. The good news? We had just introduced improvements in Redpanda version 25.1 — including smarter controls that determine when and how compaction should run.

With the customer’s consent, we scheduled a safe and seamless upgrade. Our team closely monitored every step of the rollout, using real-time metrics to track performance, resource usage, and system health.

Checkmate: less CPU, lower latency, happier users

The results were immediate and dramatic:

  • CPU usage dropped sharply across the entire system
  • Latency was cut nearly in half, meaning faster data delivery
  • The system became more stable with fewer spikes in usage or risk of overload

Even better, we saw that compaction was now running at the right times, on the right data, with much less effort — exactly as intended.

The Grafana chart below illustrates the significant reduction in CPU utilization on a single broker within the cluster.

Graph showing drastic CPU reduction with Redpanda

Redpanda’s winning observability strategy  

This wasn’t a lucky move — it was observability playing chess, not checkers. Nor is this a story about simply fixing one issue, but a reflection of how we operate at Redpanda:

  • We invest deeply in observability, so we can spot issues early and solve them confidently
  • We continuously improve our product, so every customer benefits from smarter defaults and better performance
  • And we validate every change through real-world customer outcomes, not just benchmarks or theory

For this customer, the fix meant smoother gameplay experiences and more efficient infrastructure. For us, it was a reminder that sometimes the biggest wins come from small, smart changes — made visible through great observability.

So if you're running large-scale data streams and want better performance with fewer headaches, get in touch or check out these handy resources: 

No items found.

Related articles

View all posts
Sesethu Mhlana
,
Lucien Chemaly
,
&
Jan 21, 2026

How to optimize real-time data ingestion in Snowflake and Iceberg

Practical strategies to optimize your streaming infrastructure

Read more
Text Link
Peter Corless
,
,
&
Jan 13, 2026

The convergence of AI and data streaming - Part 1: The coming brick walls

A realistic look at where AI is now and where it’s headed

Read more
Text Link
Redpanda
,
,
&
Dec 16, 2025

How to build a governed Agentic AI pipeline with Redpanda

Everything you need to move agentic AI initiatives to production — safely

Read more
Text Link
TAKE A DEEP DIVE

Let’s keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.