Schema Registry: The event is the API

Schema registry provides tools for describing your events.

September 7, 2021
Last modified on
TL;DR Takeaways:
How can I register a schema in the Redpanda schema registry?

Schemas are registered against a subject, typically in the form {topic}-key or {topic}-value. You can register a schema by posting the schema to the /subjects/{subject-name}/versions endpoint with the Content-Type of application/vnd.schemaregistry.v1+json. The response will include a unique ID for the schema in the Redpanda cluster.

How can I retrieve a schema from the Redpanda schema registry?

You can retrieve a schema directly using its unique ID by sending a GET request to the /schemas/ids/{id} endpoint. You can also retrieve a schema by subject and version using the /subjects/{subject-name}/versions/{version-number} endpoint. To get the latest version of a schema for a subject, replace the version number with 'latest' in the endpoint.

How can I update my existing Redpanda installation to take advantage of the schema registry?

To use the schema registry in an existing Redpanda installation, you need to update to the latest version of Redpanda. If you are setting up a new instance, you can follow the instructions in the Linux, MacOS, Kubernetes, or Docker quick start guides provided on the Redpanda website.

What is the purpose of a schema in an event-driven system?

Schemas in event-driven systems serve as the contract between the producer and the consumer. The schema is used to document and evolve the API. It can be used as human-readable documentation for an API, to verify data conformity to the API, to generate serialisers for the data, and to evolve the API with predefined levels of compatibility. This allows new versions of services to be rolled out independently.

What is the Redpanda schema registry subsystem?

The Redpanda schema registry subsystem provides an interface for managing schemas. It is built into the Redpanda binary, with schemas stored on the same raft-based storage engine. The RESTful interface is available on every broker, ensuring high availability. There are no new binaries to install, no new services to deploy and maintain, and the default configuration just works.

Learn more at Redpanda University
Heads up: there's a newer version of this post. Read it here!

Introduction

Highly scalable, loosely coupled architectures often use an asynchronous event-driven design. In such systems, the contract between the producer and the consumer is the event - the event is the API.

It's important to document the API, and it's important to be able to evolve the API. This is often done using schema, such as Apache Avro, JSON Schema, or Protobuf.

We're pleased to announce the beta release of the schema registry subsystem of Redpanda that provides an interface for managing schema.

Built into the Redpanda binary, schemas are stored on the same raft-based storage engine and the RESTful interface is available on every broker. You get the same high availability as your data so there's nothing new to deploy, and it's available today.

To take advantage of the schema registry in an existing Redpanda installation, make sure you update to the latest version. Otherwise, follow the instructions in the Linux, MacOS, Kubernetes, or Docker quick start guides to spin up a new Redpanda instance.

If you want to leave the infrastructure issues to us, sign up for Redpanda Cloud for the simplest way to run Redpanda.

To get down to business, skip ahead to the example.

Overview

A loosely coupled architecture not only reduces dependencies in the code, it also reduces communication overhead between and within teams. By defining the API, or in this case the event, with a schema, disparate teams can start work on the subsystems that produce and consume those events with minimal communication overhead.

Operational complexity

At Redpanda, we like to make things simple. Redpanda is an Apache Kafka®-compatible event streaming platform that eliminates Zookeeper® and the JVM, autotunes itself for modern hardware, and ships in a single binary.

We've built the schema registry directly into Redpanda; there are no new binaries to install, no new services to deploy and maintain, and the default configuration just works.

Schemas are stored in a standard compacted topic, we use optimistic concurrency control at the topic level to allow mutating REST calls to any broker. There's no need to configure leadership or failover strategies, every broker is symmetric.

Schema

A schema can be used as human readable documentation for an API, to verify data conforms to that API, to generate serialisers for the data, and to evolve the API with predefined levels of compatibility, allowing new versions of services to be rolled out independently.

Some data encodings are somewhat self-describing, but that can make them verbose. Some encodings are extensible. JSON for example, has a property name and a property value. The name isn't part of the information, but it allows new fields to be easily added by the producer and ignored by the consumer.

A schema is an external mechanism to describe the data and its encoding, allowing a reduction in the amount of data transmitted, while keeping the same information. It also allows defaults for new fields, which means that it's possible to decouple the rollout of producers and consumers.

Example

Start Redpanda

Let's jump right in and start Redpanda using Docker on Linux:

docker network create redpanda-sr
docker volume create redpanda-sr
docker run \
 --pull=always \
 --name=redpanda-sr \
 --net=redpanda-sr \
 -v "redpanda-sr:/var/lib/redpanda/data" \
 -p 8081:8081 \
 -p 8082:8082 \
 -p 9092:9092 \
 --detach \
 docker.vectorized.io/vectorized/redpanda start \
 --overprovisioned \
 --smp 1 \
 --memory 1G \
 --reserve-memory 0M \
 --node-id 0 \
 --check=false \
 --pandaproxy-addr 0.0.0.0:8082 \
 --advertise-pandaproxy-addr 127.0.0.1:8082 \
 --kafka-addr 0.0.0.0:9092 \
 --advertise-kafka-addr redpanda-sr:9092

Now we're ready to start using the schema registry!

Endpoints are documented with Swagger at http://localhost:8081/v1 or on SwaggerHub

I'm using jq to prettify and process the JSON responses.

We'll use the popular requests module (pip install requests).

For the rest of the guide, we'll assume the following for an interactive python session:

import requests
import json
def pretty(text):
 print(json.dumps(text, indent=2))

base_uri = "http://localhost:8081"

Schemas

The currently supported schema type is AVRO, we plan to support JSON and PROTOBUF.

You can query the schema registry for that:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/schemas/types" \
 | jq .

[
 "AVRO"
]

Publish a schema

Schemas are registered against a subject, typically in the form {topic}-key or {topic}-value.

Let's register an example Avro schema which represents a measurement from a sensor for the value of the sensor topic.

{
 "type": "record",
 "name": "sensor_sample",
 "fields": [
   {
     "name": "timestamp",
     "type": "long",
     "logicalType": "timestamp-millis"
   },
   {
     "name": "identifier",
     "type": "string",
     "logicalType": "uuid"
   },
   {
     "name": "value",
     "type": "long"
   }
 ]
}

We need to POST the AVRO schema to /subjects/sensor-value/versions endpoint with the Content-Type of application/vnd.schemaregistry.v1+json:

  • Curl
  • Python

curl -s \
 -X POST \
 "http://localhost:8081/subjects/sensor-value/versions" \
 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
 -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"}' \
 | jq

{
 "id": 1
}

The id is unique for the schema in the Redpanda cluster.

Retrieve the schema by its ID

We can retrieve the schema directly using its ID:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/schemas/ids/1" \
 | jq .

{
 "schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"
}

List the subjects

Now that a schema is associated with a subject, let's list the subjects:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/subjects" \
 | jq .

[
 "sensor-value"
]

Cool! We knew that, but now anyone can discover them.

Retrieve the schema versions for the subject

Schemas associated with subjects are versioned. That's how your API can evolve.

Let's query the versions for the sensor-value subject:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/subjects/sensor-value/versions" \
 | jq .

[
 1
]

Retrieve a schema for the subject

If we know the subject and the version we want, we can query directly:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/subjects/sensor-value/versions/1" \
 | jq .

{
 "subject": "sensor-value",
 "id": 1,
 "version": 1,
 "schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"
}

Instead of a specific version, we can ask for the latest:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/subjects/sensor-value/versions/latest" \
 | jq .

{
 "subject": "sensor-value",
 "id": 1,
 "version": 1,
 "schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"
}

It's also possible to query for just the schema by appending /schema to the query path. That unwraps the escaped schema:

  • Curl
  • Python

curl -s \
 "http://localhost:8081/subjects/sensor-value/versions/latest/schema" \
 | jq .

{
 "type": "record",
 "name": "sensor_sample",
 "fields": [
   {
     "name": "timestamp",
     "type": "long",
     "logicalType": "timestamp-millis"
   },
   {
     "name": "identifier",
     "type": "string",
     "logicalType": "uuid"
   },
   {
     "name": "value",
     "type": "long"
   }
 ]
}

Compatibility

There are several types of compatibility:

  • BACKWARDS- Allows consumers of the new version to read the previous version
  • FORWARDS- Allows consumers of the previous version to read the new version
  • FULL- Forwards and backwards compatibility is ensured.

Each of these will check against the most recent version. To check against all registered versions for a subject, they can have _TRANSITIVE appended.

  • NONE- No compatibility is required.

The default global compatibility is backwards.

Compatibility can be set explicitly for a subject:

  • Curl
  • Python

curl -s \
 -X PUT \
 "http://localhost:8081/config/sensor-value" \
 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
 -d '{"compatibility": "BACKWARD"}' \
 | jq .

{
 "compatibility": "BACKWARD"
}

Evolving a schema

Posting a backwards incompatible change to a subject will fail.

For example, changing the type of the value field from long to int:

{
 "type": "record",
 "name": "sensor_sample",
 "fields": [
   {
     "name": "timestamp",
     "type": "long",
     "logicalType": "timestamp-millis"
   },
   {
     "name": "identifier",
     "type": "string",
     "logicalType": "uuid"
   },
   {
     "name": "value",
     "type": "int"
   }
 ]
}

  • Curl
  • Python

curl -s \
 -X POST \
 "http://localhost:8081/subjects/sensor-value/versions" \
 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
 -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"int\"}]}"}' \
 | jq

{
 "error_code": 409,
 "message": "Schema being registered is incompatible with an earlier schema for subject \"{sensor-value}\""
}

A backwards compatible change would be changing it from a long to a double:

{
 "type": "record",
 "name": "sensor_sample",
 "fields": [
   {
     "name": "timestamp",
     "type": "long",
     "logicalType": "timestamp-millis"
   },
   {
     "name": "identifier",
     "type": "string",
     "logicalType": "uuid"
   },
   {
     "name": "value",
     "type": "double"
   }
 ]
}

  • Curl
  • Python

curl -s \
 -X POST \
 "http://localhost:8081/subjects/sensor-value/versions" \
 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
 -d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"double\"}]}"}' \
 | jq

{
 "id": 2
}

Cleanup

Now we can cleanup:

docker stop redpanda-sr
docker rm redpanda-sr
docker volume remove redpanda-sr
docker network remove redpanda-sr

Conclusion

We'll be adding more endpoints and more encodings. For an up-to-date list of features and their status see the schema registry features meta-issue on GitHub.

The schema registry is built on the same principles as Redpanda, but has not yet been optimized for performance. We are continuing to work on the schema registry, so make sure you join our slack community to get updates on the progress!

No items found.

Related articles

View all posts
Jenny Medeiros
,
,
&
Nov 11, 2025

Streamfest day 2: Smarter streaming in the cloud and the future of Kafka

Highlights from the second day of Redpanda Streamfest 2025

Read more
Text Link
Jenny Medeiros
,
,
&
Nov 11, 2025

Streamfest day 1: AI, governance, and enterprise agents

Highlights from the first day of Redpanda Streamfest 2025

Read more
Text Link
Matt Schumpert
,
Mike Broberg
,
David Yu
&
Nov 6, 2025

Redpanda 25.3 delivers near-instant disaster recovery, and more

Cost-effective Cloud Topics, Google Cloud BigLake Iceberg catalogs, and SQL Server CDC

Read more
Text Link
TAKE A DEEP DIVE

Let’s keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.