Remote Read Replicas: Read-only topics in tiered storage

Learn about the newest feature of Redpanda Tiered Storage: remote, read-only replica topics.

By
on
August 17, 2022

Shadow Indexing, first announced back in 2021, is a core part of Redpanda’s Intelligent Data API.

The first step in bringing it to light was to develop archival storage, the subsystem that uploads data to the cloud. The data in cloud storage includes topic and partition manifests that make it self-sufficient and portable.

NOTE: Redpanda communicates via the S3 API, and although many providers have cloud storage that uses this protocol, we only officially support AWS S3 and GCP Cloud Storage. For development, you can also use MinIO.

We then built topic recovery, the feature that allows us to restore local topics from the archived data. After that, we launched tiered storage, giving Redpanda users a way to save on storage costs by offloading log segments to cloud storage.

Now, like Voltron, whose five mini-bots combine to make one super bot, all of these standalone features combine in v22.2 to enable Remote Read Replicas.

What are Remote Read Replicas?

Remote Read Replicas are read-only topics on one cluster that mirror a topic on a different cluster. They can be used to serve any consumer without increasing the load on a production cluster. When a user has a Redpanda topic with archival storage enabled, they can create a separate cluster for the consumer and populate its topics from the cloud storage. (You can find the documentation for Remote Read Replicas here.)

The scope of this feature is limited only by your imagination. For example, if you have a multi-region S3 bucket, data archived to S3 in one region will replicate to other regions, where Remote Read Replicas make it available to consumers from a Redpanda cluster that’s close by.

Think of it like a CDN, but for streaming data.

PRO TIP: If you’re using AWS, enable transfer acceleration on your buckets to replicate the archived data as quickly as possible.
cloud chart

Because you can cherry-pick which topics are mirrored, read replicas afford unparalleled flexibility in cluster size for read-only workloads, regardless of the size of the data set.

Sounds pretty neat, right? But how does it work?

Shadow Indexing powers Remote Read Replicas

Shadow Indexing does the heavy lifting for Remote Read Replicas. It already handles moving the data from local to cloud storage, and the data in the cloud already contains everything necessary to restore it into a cluster.

Building a system to safely restore it into multiple clusters at the same time was a logical next step, so we configured Shadow Indexing to choose between serving local or remote data. For Remote Read Replicas it always uses remote data.

This is the best way to use the data that you already have in cloud storage: make it available for your consumers, wherever they are, without creating an operational impact on the production cluster.

It’s fast, reliable and safe, just how we like it.

We did it by adding the configuration for a read replica topic and propagating it to the storage level. When a client creates a Remote Read Replica topic, it tells the local cluster where to pull the data from, and this information is sent to all nodes and written to the controller log as part of the topic creation.

When we change the format of data that Redpanda writes to the controller log, we’re always thinking about compatibility across upgrades. Redpanda v22.2 understands the older log format, but v22.1 won’t understand the new format. We proactively prevent compatibility issues through Redpanda’s feature manager, which ensures that no new features are enabled until all nodes are running a version that supports them.

Logic for uploading and downloading new data

When we explained the architecture for Shadow Indexing in an earlier blog article, we showed that the scheduler_service creates an ntp_archiver for each partition.

In the origin cluster the ntp_archiver sits in the write path, and it understands how to communicate with the cloud storage to upload log segments and metadata.

partition-S3-bucket

We decided that since it already knows how to talk to the cloud, this was the best place to expand functionality. We gave ntp_archiver additional powers so it can monitor changes in remote partition manifests, download log segments, and truncate the partition manifest if it detects that event data was deleted from the object store.

When ntp_archiver in a remote cluster detects that the object store has new log segments, it fetches them and forwards them to archival_metadata_stm, and from there they’re added to the partition.

partition2-ntp-archiver

You might be asking, “How does ntp_archiver know whether it should upload data to the cloud or watch the cloud for changes and download the data?”

This logic sits inside of the scheduler_service, which knows from the partition configuration if it’s part of a read replica.

When an operator sets the redpanda.remote.readreplica flag while creating a topic, the metadata flows all the way from create_request to create_topic_command to topic_properties, is replicated across the cluster, and becomes part of the partition configuration that scheduler_service uses to control the behavior of ntp_archiver.

How to create a Remote Read Replica

Remote Read Replicas are duplicates of an existing topic, so you have to start with a cluster and a topic that will be the origin of the data. This origin topic needs to be configured with archival storage, which you can enable in one of the following ways:

  1. If you set cloud_storage_enable_remote_write in the cluster configuration, then all topics will archive data to the cloud.
  2. If you set redpanda.remote.write in the topic configuration, then only the topics with this setting will archive data to the cloud.

Whichever one you choose, you also have to set the following cluster properties:

  • cloud_storage_enabled
  • cloud_storage_region
  • cloud_storage_bucket
  • cloud_storage_access_key
  • cloud_storage_secret_key
  • cloud_storage_api_endpoint

Future releases will include support for IAM roles and rotating credentials, too.

Once you’ve set all of these on the origin cluster, topic data will be uploaded to cloud storage and will be ready for consumption by remote clusters.

The remote cluster can be anywhere, but the most common use case is to have it in a different region, closer to the consumers. If you’re doing that, remember to configure your cloud storage to be multi-regional so that the remote cluster is pulling from nearby storage.

The remote cluster must have cloud_storage_enable set in a cluster configuration file, and you’ll also have to provide information about the cloud provider, topic name, and the necessary credentials to access the storage. Our documentation has all of the information you need to set this up.

When you create a topic in Redpanda, you also specify how many partitions the topic will have. Partitions allow you to balance the load when using a consumer group with a topic. Each of the partitions is a separate directory, and because the data is already spread out across these partitions, it’s not possible to change the number of partitions after you’ve created the topic.

Redpanda also needs to know the initial_revision_id of the original topic when it creates topics in the remote cluster because it uses the value as part of the data directory path for identifying partitions.

Every time someone creates or deletes a topic, Redpanda increments revision_id. Segment paths in cloud storage have to be stable for Redpanda to read data that was written in earlier revisions, so the archival storage feature uses initial_revision_id for the path. This is the value of revision_id from when the partition was created.

For Redpanda to find the read replica's data in the cloud, three parameters should match for the original topic and the read replica: topic name, replication factor, and initial revision id.

The magic happens when you create a Remote Read Replica topic in the remote cluster. You only need to configure it to have the same topic name and point to a bucket, and Redpanda will configure the replication factor and initial_revision_id for you from the information in the cloud.

Creating a read replica topic is almost identical to the way you create a normal topic, except that you’ll add a configuration item that tells Redpanda the bucket where it can find the archived data.

rpk topic create <topic name> -c redpanda.remote.readreplica={bucket name}

rpk topic create <topic name> -c redpanda.remote.readreplica={bucket name}

NOTE: The topic name in the remote cluster must match the topic name in the origin cluster.

Redpanda will begin to pull from cloud storage to populate the topic data, and all of your consumers can read from the remote cluster as if they were reading from the origin cluster. You don’t need to change anything in your code for this to work, which is kind of amazing.

Redpanda helps you do more with fewer resources

Back when muscle cars were in vogue, giving an engine more fuel wasn’t enough to make a car go faster. Combustion engines burn fuel with air, and feeding the engine more air (and giving it more room to exhaust the gasses) makes it more powerful and more efficient. Fuel is important, but air is even more important.

Cloud computing works in a similar way. You can spend more money (fuel) on bigger systems (engines), or you can spend less money on smaller, optimized systems and use software (air) that makes them more efficient.

The Remote Read Replicas feature in v22.2 is just one of the ways that Redpanda helps you do more with fewer resources. Features like this open the door for new ways to process data, like using remote clusters for offline training of machine learning models, building an edge streaming CDN, or doing software development with real data from a MinIO cluster in your office.

We’re exploring ways to expand this feature for our customers, and we want to hear from you how you want to use it. Do you want to make replicas for entire clusters with a single command? Do you want to restore an entire cluster, including data and metadata, for partition and leader balancing? What other ideas do you have on how you’d use it? Come tell us!

The best way to share your ideas with us is to join the discussion on GitHub in the Redpanda repository. If you’re not already in our Slack community, you can join us there too to ask questions and get help with your setup. We also have the Redpanda University with free self-paced training to improve your knowledge of Redpanda and data streaming.

Graphic for downloading streaming data report
“Always-on” production memory profiling in just 5 instructions
Stephan Dollberg
&
&
&
August 27, 2024
Text Link
Data plane atomicity and the vision of a simpler cloud
Alexander Gallego
&
Camilo Aguilar
&
&
August 21, 2024
Text Link
Write caching: drive your workloads up to 90% faster
Matt Schumpert
&
Brandon Allard
&
Bharath Vissapragada
&
Nicolae Vartolomei
July 16, 2024
Text Link