Using Debezium and Redpanda for change data capture (CDC)

Leveraging Redpanda's Compatibility with Debezium to stream changes from MySQL.

August 29, 2021
Last modified on
TL;DR Takeaways:
How can I set up a CDC stream using Redpanda and Debezium?

You can set up a CDC stream using Redpanda and Debezium by following the tutorial provided in this blog. It involves creating a Docker Compose file, running the docker-compose command, and setting up Kafka Connect to start capturing changes. The changes are then streamed from MySQL to Redpanda.

What are some popular use cases for Change Data Capture (CDC)?

CDC can be used for safe migration from legacy systems, division of monolithic data, real-time analytics, easy integrations, cache invalidations, and outbox patterns.

What happens after setting up Kafka Connect for CDC?

After setting up Kafka Connect for CDC, it starts reading binlogs from MySQL and streaming changes to Redpanda. Kafka Connect creates topics for each table and you can observe changes to the rows in the specified table. As you update, new records arrive in the consumer.

What is Change Data Capture (CDC)?

Change Data Capture (CDC) is a design pattern used to track and capture changes made to data in a database. It identifies and captures insertions, updates, and deletions in real-time or near real-time, allowing these changes to be propagated to downstream systems. CDC enables efficient data synchronization, real-time analytics, and event-driven architectures by capturing only the changes rather than performing full data refreshes. Common CDC implementation methods include database triggers, log-based CDC (reading transaction logs), and timestamp-based tracking. CDC is essential for maintaining data consistency across distributed systems, populating data warehouses, enabling real-time ETL processes, and supporting microservices architectures. With Redpanda, CDC events can be streamed and processed efficiently, enabling real-time data pipelines and event-driven applications.

What services are required to set up a CDC stream with Redpanda and Debezium?

The services required to set up a CDC stream with Redpanda and Debezium include Docker, Docker Compose, Redpanda, Kafka Connect, Debezium, and MySQL. These services need to be in the same network for them to be reachable from each other.

Learn more at Redpanda University

Introduction: How to set up CDC stream in Redpanda

In this tutorial, you are going to build a CDC stream using Redpanda and Debezium. Note that the entire Kafka Connect ecosystem works out of the box with Redpanda, as Redpanda is API-compatible with Apache KafkaⓇ.

What is change data capture (CDC)?

CDC is the process of recognizing when data has been changed in a source system so a downstream process or system can take action on that change.

Why would you use it?

Here are some popular use cases:

  • Safe migration from legacy systems
  • Division of monolithic data
  • Real-time analytics
  • Easy integrations
  • Cache invalidations
  • Outbox patterns

Useful links about technologies used in this tutorial

Note: Redpanda aims to be fully compatible with Kafka APIs, so all existing connectors should work with Redpanda without any changes.

Prerequisites

Quick tour of services

  • Redpanda - The modern data streaming platform
  • Kafka Connect - Integrations that connect Kafka with other data systems
  • Debezium - A set of plugins used by Kafka Connect to capture changes in databases
  • MySQL - Database

Debezium and Redpanda tutorial for CDC

Let's start our tutorial by looking at the Docker Compose file. We encourage you to set up a working directory, and follow along.

mkdir debezium && cd debezium && touch redpanda-debezium.compose.yml

# redpanda-debezium.compose.yml
version: "3.3"
services:
  redpanda:
    image: vectorized/redpanda
    ports:
      - "9092:9092"
      - "29092:29092"
    command:
      - redpanda
      - start
      - --overprovisioned
      - --smp
      - "1"
      - --memory
      - "1G"
      - --reserve-memory
      - "0M"
      - --node-id
      - "0"
      - --kafka-addr
      - PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092
      - --advertise-kafka-addr
      - PLAINTEXT://redpanda:29092,OUTSIDE://redpanda:9092
      - --check=false
  connect:
    image: debezium/connect
    depends_on:
      - redpanda
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: "redpanda:9092"
      GROUP_ID: "1"
      CONFIG_STORAGE_TOPIC: "inventory.configs"
      OFFSET_STORAGE_TOPIC: "inventory.offset"
      STATUS_STORAGE_TOPIC: "inventory.status"
  mysql:
    image: debezium/example-mysql:1.6
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: debezium
      MYSQL_USER: mysqluser
      MYSQL_PASSWORD: mysqlpw

These services are in the same network, so they are reachable from each other. Also, their service name resolves to their IP addressses in the network. We can ping Redpanda from Kafka Connect:

$ docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Names}}"
Output:
NAMES                 STATUS         NAMES
debezium_connect_1    Up 6 minutes   debezium_connect_1
debezium_mysql_1      Up 6 minutes   debezium_mysql_1
debezium_redpanda_1   Up 6 minutes   debezium_redpanda_1

$ docker exec -it debezium_connect_1 /bin/bash # starts and attaches to the shell inside container

[kafka@a48a914cf7f1 ~]$ ping redpanda
PING redpanda (192.168.16.3) 56(84) bytes of data.
64 bytes from debezium_redpanda_1.debezium_default (192.168.16.3): icmp_seq=1 ttl=64 time=0.152 ms
64 bytes from debezium_redpanda_1.debezium_default (192.168.16.3): icmp_seq=2 ttl=64 time=0.103 ms
^C
--- redpanda ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.103/0.127/0.152/0.026 ms

The environment variables and values are intended to be self-explanatory, but you can go to the Docker Hub and read about their meaning.

Run docker-compose -f redpanda-debezium.compose.yml up

Once everything has successfully started, open a new terminal session and enter into the Redpanda shell.

$ docker exec -it debezium_redpanda_1 /bin/bash
redpanda@67f3306a7a30:/$ rpk topic list
  Name               Partitions  Replicas
  inventory.configs  1           1
  inventory.offset   25          1
  inventory.status   5           1

These are management topics required for Kafka Connect to run in distributed mode.

Let's do the same for MySQL:

$ docker exec -it debezium_mysql_1 /bin/bash
root@7119b581859f:/$ mysql -u mysqluser -pmysqlpw
mysql> use inventory; # this is example database pre-created during start-up
mysql> show tables;
+---------------------+
| Tables_in_inventory |
+---------------------+
| addresses           |
| customers           |
| geom                |
| orders              |
| products            |
| products_on_hand    |
+---------------------+
6 rows in set (0.00 sec)
# we are interested in customers

mysql> select * from customers;
+------+------------+-----------+-----------------------+
| id   | first_name | last_name | email                 |
+------+------------+-----------+-----------------------+
| 1001 | Sally      | Thomas    | sally.thomas@acme.com |
| 1002 | George     | Bailey    | gbailey@foobar.com    |
| 1003 | Edward     | Walker    | ed@walker.com         |
| 1004 | Anne       | Kretchmar | annek@noanswer.org    |
+------+------------+-----------+-----------------------+
4 rows in set (0.00 sec)

Now we need to start capturing changes. For this purpose, Kafka Connect exposes a REST API through which we can upload configuration parameters. For example:

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "dbserver1",
    "database.include.list": "inventory",
    "database.history.kafka.bootstrap.servers": "redpanda:9092",
    "database.history.kafka.topic": "schema-changes.inventory"
  }
}

Some important parameters:

  • connector.class - class that implements capturing logic. In our case, it will read mysql binlogs.
  • database.server.\* - to identify mysql server. Will be used as a prefix when creating topics.
  • database.include.list - list of table names to include.
  • database.history.kafka.topic - connect will upload db schemas there, and will use it to inform from where connect should begin reading if restarted.

All we have to do is to upload this configuration to Kafka Connect:

Note: perform this curl command on the host machine. The Docker Compose file exposes port 8083.
curl --request POST \
  --url http://localhost:8083/connectors \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "dbserver1",
    "database.include.list": "inventory",
    "database.history.kafka.bootstrap.servers": "redpanda:9092",
    "database.history.kafka.topic": "schema-changes.inventory"
  }
}'
# response should be 201

After executing the above curl command, Kafka Connect is set up. It will start reading binlogs from MySQL and streaming changes to Redpanda.

redpanda@67f3306a7a30:/$ rpk topic list
  Name                                  Partitions  Replicas
  dbserver1                             1           1
  dbserver1.inventory.addresses         1           1
  dbserver1.inventory.customers         1           1
  dbserver1.inventory.geom              1           1
  dbserver1.inventory.orders            1           1
  dbserver1.inventory.products          1           1
  dbserver1.inventory.products_on_hand  1           1
  inventory.configs                     1           1
  inventory.offset                      25          1
  inventory.status                      5           1
  schema-changes.inventory              1           1

As you can see, Kafka Connect created topics for each table. We are interested in the dbserver1.inventory.customers topic, as it is the only table present in include.list.

redpanda@67f3306a7a30:/$ rpk topic consume dbserver1.inventory.customers
... # it will start streaming changes as json payload
{
  "key": {...}
  "message": {...}
  "partition": 0,
  "offset": 0,
  "timestamp": "2021-08-29T16:47:04.436Z"
}

This is the representation of a change to the row in customer table. The message field contains details about schema, operation type, and the before and after value of the row.

{
  "schema": {
    "type": "struct",
    "fields": [ ... ],
    "optional": false,
    "name": "dbserver1.inventory.customers.Envelope"
  },
  "payload": {
    "before": null,
    "after": {
      "id": 1004,
      "first_name": "Anne",
      "last_name": "Kretchmar",
      "email": "annek@noanswer.org"
    },
    "source": {
      "version": "1.6.1.Final",
      "connector": "mysql",
      "name": "dbserver1",
      "ts_ms": 1630246982521,
      "snapshot": "true",
      "db": "inventory",
      "sequence": null,
      "table": "customers",
      "server_id": 0,
      "gtid": null,
      "file": "mysql-bin.000008",
      "pos": 154,
      "row": 0,
      "thread": null,
      "query": null
    },
    "op": "r",
    "ts_ms": 1630246982521,
    "transaction": null
  }
}

Do not interrupt consumption by trying to make changes to the customers table.

mysql> UPDATE customers SET first_name='Anne Marie' WHERE id=1004;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

As you update, you should see new records arriving in the consumer.

Feel free to have fun and create as many changes as you want!

Here is an overview of what is happening.

diagram

Summary

You can see how easy it is to set up CDC streams using Redpanda. Feel free to play around and set up CDC streams from other popular database systems.

Credits

This tutorial is heavily inspired by the Debezium getting started tutorial. There you can find additional details about configuration parameters and set up with Kafka+ZooKeeper, so feel free to check it out.

And, don't forget to join the Redpanda Community on Slack where you can share what you're working on and learn what others in our community are building.

No items found.

Related articles

View all posts
Nicolas Dupont
,
,
&
Oct 14, 2025

Cyborg and Redpanda: Secure streaming pipelines for enterprise AI

Stream events from Redpanda Connect into CyborgDB for confidential, real-time Enterprise AI workflows

Read more
Text Link
Paul Wilkinson
,
,
&
Oct 7, 2025

3 demos to get started with Redpanda on Apache Iceberg™

Start developing on Iceberg with a single script

Read more
Text Link
Chandler Mayo
,
,
&
Sep 30, 2025

Integrating MQTT for durable real-time data streams

Persist device messages, fan them out through 300+ connectors, and make them queryable

Read more
Text Link
TAKE A DEEP DIVE

Let’s keep in touch

Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.