Me and my shadow (link!): Disaster recovery replication made easy

Me and my shadow (link!): Disaster recovery replication made easy

Start replicating your data the Redpanda way

by

April 21, 2026

Last modified on

TL;DR Takeaways:

No items found.

Learn more at Redpanda University

Most organizations require business continuity: the ability to keep operating while handling and recovering from a significant disruption, such as a regional outage. This is where disaster recovery comes in, and for streaming data, this almost always means continuous replication. While this would historically be done with external tooling, Redpanda now has built-in replication that’s fast, simple, and offers perfect fidelity.

For teams that need reliable, cost-efficient replication with the lowest operational complexity, shadow linking is a safe bet.

In this blog post, we’ll walk you through the new Shadow Linking feature within Redpanda and inspire you to take even better care of your streaming data by replicating it to another cluster.

Why use replication?

Disaster recovery helps ensure you don’t lose revenue when your systems are unavailable for an hour or so. Keeping disruption to a minimum means first understanding, and later reducing, two elements:

Recovery Time Objectives (RTOs): how long it takes to get back up and running again
Recovery Point Objectives (RPOs): how much data loss (measured in time) is acceptable

Replication helps achieve both; RPO can be reduced by replicating in near real time, while RTO can be reduced by enabling application recovery through failover.

Replication is also useful beyond disaster recovery:

Platform migration: Let’s say you just built a shiny new Redpanda cluster. For some workloads, you create the topics and start using them from empty. For others, you need to replicate (backfill) your historical data first, then keep it in sync as you move your workloads over—a process that takes time..

Multi-region data distribution/fan-out architectures: Let’s say you have a service with a large number of subscribers across different geographic regions, but each wants to monitor the same event feed. Rather than always having every subscriber read from a single central cluster, we can instead replicate the data to region-local clusters and let subscribers pull data from their region-local instance.

What is Shadowing?

Shadowing is Redpanda’s enterprise-grade disaster recovery feature that’s built into the heart of the broker itself. It replicates everything you need to use a topic on a remote cluster:

Topic data: All records are replicated byte-for-byte, preserving offsets, timestamps, headers compression, and batching. You can control which topics are replicated.

Topic configurations: This includes the partition count and topic properties such as retention, compression, and cleanup policy. Not all properties are replicated; be sure to check out the docs!

Consumer group data: Committed offsets and group membership, enabling failover of consumers. You can control which groups are replicated.

ACLs / security policies: Access control lists are replicated to ensure consistent authorization across clusters.

Schema registry data: The _schemas topic can be replicated when the feature is enabled, allowing schemas (and schema settings, such as compatibility) to be replicated.

Shadow linking architecture

The architecture follows a simple active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates:

Active-passive replication through shadow linking

If a disaster occurs, you can fail over the shadow topics, making them fully writable on the shadow cluster. At that point, you can redirect your applications to the shadow, which becomes active as the new production cluster:

Failover of producers and consumers during a disaster.

A shadow link is defined within the shadow cluster and creates tasks internal to the broker that read data from the source cluster and write it locally. These tasks read data from the source using the standard Kafka API. Once the link is established, topics will be created and configured automatically, ACLs will be applied, commits will be replicated, and (of course!) messages will be mirrored, all on a continuous basis. Let the data flow!

Things to be aware of:

Offsets are preserved: Replicated messages are guaranteed to have the same offset on the shadow topic as they had on the source, which makes things like failover much, much easier.‍

Source clusters are unaware of the replication: A shadow link is configured only on the destination cluster only. The source cluster is completely unaware of the link, aside from the additional read workload it sees.

Replication is asynchronous: As your upstream producers write messages to the source cluster, the acknowledgments they receive only indicate that those messages are durably written to that source cluster—not that the messages are also replicated to the shadow cluster.

Shadow topics are read-only until failed over: While the client of a shadow link is writing to a topic, that topic is read-only to all other producers, ensuring that the topic stays in sync with the source and doesn’t diverge in contents. It will only become writable once failed over.

Network connectivity: Each broker in the shadow cluster runs replication tasks that read directly from the brokers in the source cluster, enabling massively parallel data transfer. This fully distributed approach provides excellent throughput and allows you to scale replication capacity simply by adding more brokers, up to the limit of your network.

Schema replication: Schemas aren’t replicated by default, but this is easily enabled when configuring a link.

Link deletion: You can only delete a shadow link once all of the flows are failed over and there are no active replication flows. This is A Good Thing™.

Why use Shadow Linking for replication?

Asynchronous replication has been available using tooling such as Apache MirrorMaker 2.0 for many years, so why has Redpanda developed Shadow Linking now, and what is different about it?

1. Simplicity

Firstly, Shadow Linking is broker-native functionality, not an external service. When we replicate messages, we guarantee they will always have the same offset as they do on the source, without any offset translation.‍

This is a huge advantage over traditional approaches:

Failover is simpler; just reconnect your consumer to the new brokers and start reading from where you left off.

We guarantee data consistency for any data replicated between clusters, even if the link between them is unreliable. No more producer replays leading to message duplication.

Commits can be replicated directly between clusters without needing to translate the offsets first.

Operating a Shadow Link is also simple:

Shadow links have sensible default settings, meaning out-of-the-box performance is already great

A link can be defined in many ways: using the command line (rpk), REST, Redpanda K8s operator, or interactively via console in your browser.

The state of the link and the replication flows it is handling can be viewed in console, via rpk and via REST, allowing easy understanding and integration.

Prometheus-compatible metrics to see the link status, including replication lag, are published by the broker, so your existing monitoring will automatically pick them up.

From either a development or an operational perspective, simplicity always has an outsized payoff; complexity adds cognitive load, adds unknowns and ambiguities, and increases the number of failure points. In short, it adds risk. Simple is good.

This simplicity means that failover isn’t something to fear, but something that can become routine. By practicing failover, teams can provide verifiable evidence of their disaster recovery readiness. Having high confidence in your preparedness (based on demonstrated capability) is infinitely more useful than the usual hopeful assumptions.

2. Efficiency

Shadow Links also have the benefit of efficient infrastructure. Consider replicating a stream of messages at 1GiB/s using an external tool such as MirrorMaker:

Replicating without shadowing: more hardware, more money

In addition to the source and sink clusters, you would need another cluster to host the replication workload. In contrast, when using shadowing, no additional hardware is needed:

Replicating with Redpanda Shadowing: refreshingly simple and efficient

Worse still, external replication tools can’t guarantee the fidelity of the replicated data; it’s not uncommon for duplicate messages to be introduced by the replication layer. In other words, not only is the external tool approach more expensive, but it also yields a worse outcome. (Lose-lose, anyone?)

It’s always true that nothing comes for free; adding shadow linking to an existing cluster will use resources, but it’s also important to consider that your existing Redpanda clusters may have enough processing headroom already. In which case, by using shadowing, you’re doing the work directly on the broker, getting even more value from your existing investment.

3. Performance

Just like the rest of the Redpanda broker, the Shadowing components are written in high-performance C++, which means that not only do you get great replication performance, but there’s also no Kafka Connect and no JVM tuning in sight. Woohoo!

As an illustration of the performance, I recently scale-tested shadowing, driving the source cluster at 2.5 GiB/s. During that experiment, I was able to replicate with a total lag (across all topics) that was consistently lower than 10,000 messages—on a workload producing 2.5 million messages per second—giving us an effective RPO of around 4 milliseconds on average.

Shadow linking also scales naturally with the cluster, both vertically and horizontally. If you use bigger nodes with more cores, Redpanda’s internal shared-nothing architecture can use that to its fullest. If you scale out the cluster and add more nodes, we will use them to increase the shadowing parallelism, all without you needing to tune anything out of the box.

Switchover/failover

When a link is active, data is flowing to the shadow cluster and the topics being written to are read-only to other producers. So what happens when the source cluster gets hit with a meteorite? (Or more likely, you’re having an outage in a specific cloud region?)

This is where failover comes in. When you failover a link, either by topic or entirely, the replication flows stop and the linked topics will become writable to regular producers. At this point, you can migrate your consumers and producers by reconfiguring them to point directly to the shadow cluster instead of the source cluster and continue where you left off.

Keep in mind that if you have an app-level outage, you don’t need to failover the whole link—just failover individual topics as needed. Sorted. And, if an emergency does happen,we’ve got you covered! Not only do we have a world-class support team on standby, but we also have a failover runbook for you to use. Definitely one to keep bookmarked!

Reciprocal shadowing

As we’ve seen above, shadow links are unidirectional: data always flows from the source cluster to the shadow, in an active-passive manner, with the shadow cluster serving only as a backup. Some organizations may wish to distribute risk by having shadow clusters serve primary roles to get the most out of their infrastructure.

Let’s say you run your business in two regions, and have producers (and Redpanda clusters) based in each region. Under normal circumstances, you’d want each producer to write to their local Redpanda cluster in order to get the lowest produce latencies, rather than having some of them always write to a distant cluster. This kind of reciprocal active-passive architecture, in which both clusters are active and usable, can still be achieved with parallel shadow links.

In this deployment architecture, each cluster acts as both a source cluster and a shadow at the same time:

Reciprocal active-passive Shadowing architecture

Usage

Running a reciprocal active-passive cluster pair is as simple as configuring two shadow links — one on each cluster. This design benefits from using a consistent prefix to name topics and consumer groups, identifying their source site. In the example above, the prefixes of a_ and b_ in the topic names indicate which cluster they originate in.

While not strictly necessary, the name prefixing is helpful for multiple reasons:

Reduces the likelihood of topic/group naming clashes between sites
Simplifies shadow link configuration (topics and groups can be selected for replication on the basis of the prefix rather than needing a static list of topics and groups)
Helps operators know at a glance which site a topic originates from

In addition, a primary site for schema registry would need to be chosen (since both sites will use _schemas).

Producing and consuming

When running a reciprocal active-passive cluster pair, producers will be configured as expected: simply writing to their local topic. Consuming messages is conceptually a little more complex, in that there are now two topics that need to be read by the same consumer group (local and shadow). In practice, this just means a little more configuration of the consuming client.

Diagram showing data flow from Producer to Site A and B topics via shadow links to Consumer. — Producing and consuming in an active-active (mutual shadowing) design

As you might expect with this architecture, both sites support simultaneous producing and consuming. Failover in either direction (from A to B, or B to A) is possible, making this design fault-tolerant to an outage in either location.

Want to know more?

Shadow linking addresses the core challenge of real-time data replication with both simplicity and performance. Now that you’re ready to learn more about shadowing, we have options for you:

For those of you who want to try Shadowing out, we’ve put together a demo that spins up a couple of Redpanda clusters on a local Kubernetes cluster like Minikube or Kind, and connects them with shadowing (all driven by the Redpanda operator). This shows the basic workflow for configuring the link, seeing data flow, and failing over.
You can also learn more about Shadow Linking in self-managed and cloud environments by reading our lovely documentation (or asking our AI agent about it)
Come say hi in our Redpanda Community Slack! Bring your questions or just let us know how you get on.

No items found.

Join the Redpanda Community on Slack

Chat with our team, ask industry experts, and meet fellow data streaming enthusiasts.

FEATURED RESOURCE

Table of contents

Graphic for Redpanda Streamfest 2025

Related articles

Paul Wilkinson

,

,

&

Jun 23, 2026

Bridge Queries in Redpanda SQL

Have your real-time cake and eat your analytics too

Read more

Evgeny Lazin

,

,

&

Jun 18, 2026

Adaptive write request scheduling in Redpanda's Cloud Topics

Solving a Kafka problem to balance batching efficiency against latency and cost

Read more

Marc Millstone

,

Peter Corless

,

&

Jun 16, 2026

What's new in the Redpanda Agentic Data Plane

Now deployable on Amazon Web Services (AWS)

Read more

PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.