Introducing Iceberg output for Redpanda Connect

From any source to any schema — lakehouse ingestion made simple (and boring)

by

March 5, 2026

Last modified on

TL;DR Takeaways:

No items found.

Learn more at Redpanda University

Apache Iceberg™ has become the table format that teams reach for when they want their streaming data to be queryable in the lakehouse. But for many, the "last mile" of this journey is where architectural complexity and hidden costs begin to pile up. Getting data there often means standing up heavyweight infrastructure and stitching together services just to move bytes from A to B.

We wanted to make that a lot simpler.

Today we're announcing the Iceberg output for Redpanda Connect: a native component that writes streaming data to Iceberg tables directly from a declarative YAML pipeline. That means you can transform, enrich, and route streams before they land in your data lake, and query tables seconds later from any platform that supports Iceberg. See? Simple.

From stream to table, without the detours

If you're already running Redpanda, you might already be familiar with Iceberg Topics. They give you a zero-ETL path from broker to table that's streamlined for high-speed Kafka streams. Produce to a topic, and Redpanda handles the rest. For many workloads, that's all you need.

But maybe your data arrives from an HTTP webhook, a Postgres CDC stream, or a GCP Pub/Sub subscription. Maybe you need to normalize a payload, drop PII, or split a mixed event stream by type before anything hits the lakehouse. That's the gap this connector fills.

The Iceberg output plugs into Redpanda Connect's full ecosystem of 300+ inputs and processors. That means any source you can read from, you can now land directly into Iceberg tables with whatever transformations you need along the way.

Example pipeline

Here's a pipeline that reads events from a Redpanda topic, enriches each message with an ingestion timestamp, and routes them into per-type Iceberg tables:

input:
  redpanda:
    seed_brokers: ["${REDPANDA_BROKERS}"]
    topics: ["events"]
    consumer_group: "iceberg-sink"

pipeline:
  processors:
    - mapping: |
        root = this
        root.ingested_at = now()

output:
  iceberg:
    catalog:
      url: https://polaris.example.com/api/catalog
      warehouse: analytics
      auth:
        oauth2:
          client_id: "${CATALOG_CLIENT_ID}"
          client_secret: "${CATALOG_CLIENT_SECRET}"
    namespace: raw.events
    table: 'events_${!this.event_type}'
    storage:
      aws_s3:
        bucket: my-iceberg-data
        region: us-west-2

The value for seed_brokers uses Redpanda Cloud contextual variables that are available out-of-the-box in your environment. It's optional for the Redpanda input, but it is included above for clarity.

The table and namespace fields both support Bloblang interpolation, so a single pipeline can route messages to different tables based on content. Traditional Iceberg connectors often lead to "configuration hell," where every new table requires rigid mapping and brittle, manual updates. Suffer no more with Redpanda Connect!

Reshape your data before it lands

With Bloblang, you can:

Reshape, filter, and enrich messages inline
Flatten nested JSON into a columnar-friendly schema
Strip sensitive fields before they reach the lakehouse
Derive new columns from existing ones

It's all just a mapping processor in your pipeline block, running before the Iceberg output ever sees the message.

Your analysts get clean, query-ready tables. Your engineers get a single pipeline definition to maintain. No sidecar services, no separate Flink job.

Works with your catalog

The connector speaks the Iceberg REST Catalog API, so it works with the catalogs you're probably already running:

Apache Polaris™
AWS Glue Data Catalog
Databricks Unity Catalog
Snowflake Open Catalog
GCP BigLake
If your catalog speaks REST, you can point the connector at it.

Small in size. Big on benefits

How Redpanda Connect's new Iceberg output can help teams move quickly and efficiently, so you can spend more time building and analyzing, instead of moving and preparing data.

Less schema maintenance

While other connectors can technically evolve a schema, doing so without a schema registry usually forces you into "maintenance toil" (chaining brittle Kafka Connect SMTs) or leaves you with "dirty data" (where all columns land as string data types). Redpanda Connect gives you the best of both worlds: the flexibility of raw JSON with the precision of a structured lakehouse.

We handle cleaning, masking, and landing in a single pipeline. The Iceberg output also uses schema evolution to sense new fields in an incoming JSON stream and automatically updates the Iceberg table metadata. No manual DDL, no registry required, and no ticket for the ops team every time an app update adds a column.

Efficient at scale

Stop paying for quiet data sources and achieve greater resource density. Unlike legacy connectors that heartbeat on a fixed timer regardless of activity, Redpanda Connect uses data-driven flushing. It only executes a flush operation when there is actual data to move, preventing the "small file problem" on object storage and ensuring you aren't wasting compute cycles on empty operations.

Enterprise-grade governance

We speak security and isolation. Redpanda Connect fits into your existing OAuth2 token exchange and per-tenant REST catalog (like Polaris) workflows out of the box. And because Redpanda Connect is so lightweight (runs as low as 0.1 vCPU), you can deploy isolated, high-density pipelines for every tenant or department without blowing your cloud budget.

When to use Iceberg output

An overview of when to use the in-broker Redpanda Iceberg Topics integration or the Iceberg output in Redpanda Connect.

Feature	Redpanda Iceberg Topics (in-broker)	Redpanda Connect Iceberg output (sink connector)
Primary value	Zero-ETL convenience. Automatically write streams to tables.	Integration flexibility. Route, transform, and automate in-stream before landing to tables.
Best for	High-throughput, standard Kafka-to-lakehouse.	Complex pipelines, non-Kafka sources, and "set-and-forget" schemas.
Data sources	Redpanda Streaming topics only.	Hundreds of sources (HTTP, CDC, SQS, Kinesis, etc.).
Schema evolution	Registry-Driven. Evolves automatically as you update Avro/Protobuf/JSON schemas in the Schema Registry.	Data-Driven. Table structure can evolve automatically from raw JSON—no registry required.
Routing	Optimized for 1 topic → 1 table.	Multi-table. Route to many tables from one stream.
Infrastructure	Zero extra components.	Stateless container (stateless pipeline) on K8s.
Availability	Redpanda Cloud BYOC or Self-Managed Enterprise Edition.	Enterprise tier connector for Redpanda Connect (requires a license).

Getting started

The Iceberg output ships with Redpanda Connect v4.80.0. This initial release focuses on high-speed append-only ingestion.

Pull the latest from our Docs and write your first pipeline, then query your tables from the analytics engine of your choice.

Check out the full configuration reference for every field and option, including partition spec expressions, commit tuning, and batching configuration. Build your first pipeline today and start landing data on your own terms: stream, table, or all of the above!

No items found.

Join the Redpanda Community on Slack

Chat with our team, ask industry experts, and meet fellow data streaming enthusiasts.

FEATURED RESOURCE

Table of contents

Graphic for Redpanda Streamfest 2025

Related articles

Paul Wilkinson

,

,

&

Jun 23, 2026

Bridge Queries in Redpanda SQL

Have your real-time cake and eat your analytics too

Read more

Evgeny Lazin

,

,

&

Jun 18, 2026

Adaptive write request scheduling in Redpanda's Cloud Topics

Solving a Kafka problem to balance batching efficiency against latency and cost

Read more

Marc Millstone

,

Peter Corless

,

&

Jun 16, 2026

What's new in the Redpanda Agentic Data Plane

Now deployable on Amazon Web Services (AWS)

Read more

PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.