Real-time analytics at scale: Redpanda and Snowflake Streaming

How we streamed 14.5 GB/s to Snowflake with 7.5 second P99 latency

October 2, 2025
Last modified on
TL;DR Takeaways:
No items found.
Learn more at Redpanda University

When you’re monitoring fast-moving markets or running critical analytics, every second matters. Organizations can’t want to wait minutes to hours for insights. 

Redpanda is known for its speed and simplicity, so we ran a benchmark to land on the highest-throughput, lowest-latency streaming data pipeline using Redpanda and Snowflake for near real-time analytics on equity market data.

Here’s the TL;DR:

  • We streamed 3.8 billion messages at 14.5 GB per second from Redpanda to a single Snowflake table with a P50 latency of under two seconds, and P99 under eight seconds — near real-time analytics at massive scale.
  • We went from concept to production-scale pipelines in the cloud within a day, aided by Redpanda’s automation tooling, proving real-time analytics doesn’t require weeks of engineering effort.
  • We used a random 1k message payload to mitigate the effects of compression.
  • We tuned for the delicate balance of throughput vs. latency, with key optimizations you can replicate to maximize your own real-time analytics performance.

To show you how we got there, we’ll walk through the setup, share benchmark results, and highlight key tuning insights so you can get the most out of your own data pipelines. 

Why Redpanda Connect + Snowflake Streaming?

Before diving into benchmarks, let’s cover the pipeline architecture.

Redpanda: streaming without the complexity

Redpanda is a radically efficient data streaming platform built for performance and fully compatible with Apache Kafka® APIs. For this benchmark, we deployed a 9-node Redpanda Enterprise cluster on AWS EC2 m7gd.16xlarge instances, which gave us the scale and reliability needed to handle our workload. Deployment was automated and efficient using the official Terraform and Ansible tooling.

Redpanda Connect: pipelines in minutes, not days

Redpanda Connect offers 300+ pre-built connectors we can take off the shelf to compose our pipelines with little to no code. 

Redpanda Connect has a snowflake_streaming connector that’s optimized for high throughput, low latency ingest use cases, which are common among Redpanda users. The connector is based on the Snowpipe Streaming API  and supports schema evolution, meaning we can add new columns at will. It also allows us to parallelize, batch, and achieve exactly-once delivery

For the benchmark, we stood up 12 Redpanda Connect nodes on AWS EC2 m7gd.12xlarge instances.

Setting up the end–to-end pipeline

The use case was to stream messages into Snowflake table rows as fast and as close to real time as possible for downstream analytics use cases, like market surveillance. Our destination for this data was a table in a Snowflake on AWS.

Here’s an overview of how it works:

Architecture diagram

Step 1: Populate the topic

First, we created a dataset to simulate a high-volume workload by preloading a 1,200-partition topic with 3.8 billion randomized AVRO-encoded messages, each with an exact size of 1000 bytes (1 KB).

Step 2: Configure the pipeline

Each Redpanda Connect node was configured with a pipeline job using the kafka_franz input connector to read from the Redpanda topic and the snowflake_streaming output connector to insert rows into a Snowflake table following the defined schema. 

We also used Redpanda Connect’s broker ability to parallelize the inputs and outputs on each node to fully use the system's resources and boost performance.

Check out an example of the pipeline code on GitHub.

Step 3: Run the benchmarks

We ran a battery of performance tests, searching for the magic combination of connector options and scaling dimensions to achieve maximum throughput and lowest latency. 

Aside from a Redpanda cluster and Redpanda Connect nodes equipped with the raw compute, network, and storage resources to handle our workload, we found that what mattered most was batch settings and increasing parallelism as much as possible. 

Whether that be through partition count, snowflake_streaming options, or — as it turned out — unlocking massive throughput by scaling the number of inputs and outputs within a single node.

We collected pipeline metrics with Prometheus and visualized them in Grafana over sampling windows representing the sustained peak for each test.

Finding the sweet spot = blazing-fast speed at massive scale

To create a “control group” for this experiment, we first wanted to test how our pipeline performed without Snowflake in the picture. For this, we sent messages to the drop output. We capped out at 15.1 GB/s with 8.38ms P99 latency, decoding and reading all 3.8 billion messages in five minutes total. 

This became our target: how close to these numbers could we get when streaming this data to Snowflake? 

We observed the best balance — “the sweet spot” — between throughput and latency on a test which resulted in 14.5 GB/s with P50 latency of 2.18s and P99 of 7.49 seconds. All 3.8 billion messages landed as rows in the Snowflake table within six minutes total. 

Of that 7.49-second P99 latency, 86% of it can be attributed to the Snowflake upload, register commit steps.

Note: Redpanda Connect and Snowflake were connected over the public internet; using AWS PrivateLink would reduce latency even further.

Here are a few shots of the winning test.

Peak sustained message throughput
P50, P90 and P99 end-to-end latency
Latencies of each step performed by the Snowflake connector

Key tuning insights

As promised, here are a few takeaways from our benchmark that might help you get the most out of your own data pipelines. 

  • We achieved 14.5 GB/s throughput, even though Snowflake notes that the best performance on a single table update should be limited to 10 GB/s aggregate throughput — that’s 45% greater throughput than they thought possible!
  • Using a binary format like AVRO showed ~20% throughput improvement over a textual format like JSON.
  • Using counts as the batching factor resulted in higher performance over byte_size due to less calculation overhead.
  • Snowflake build steps pose a latency bottleneck, which can be mitigated by increasing build_paralellism to a value close to the available instance cores, reserving some for other processes. For example, we had 48 core machines and set this to 40.
  • Snowpipe Streaming channels maximize loading to Snowflake. We controlled the number of channels by combining channel_prefix with max_in_flight. Note that Snowflake supports a maximum of 10,000 channels per channel. (We had the Snowpipe API screaming at us on several tests.)

Power real-time analytics at the speed of Redpanda 

Redpanda Connect and Snowflake Streaming can deliver impressive throughput while keeping P99 latencies under 8 seconds in the “sweet spot” configuration.

For data architects and engineering leaders, these results prove you can confidently use Redpanda to power real-time analytics and business intelligence pipelines — for market surveillance, fraud detection, or operational dashboards — so businesses can get insights in seconds, not hours. 

Note that these benchmarks should serve as a guideline for building your own pipelines and tuning them to achieve your goals. The results can be extrapolated to better understand how to best size and configure for your use case.

If you have questions about this benchmark or any of the technologies we used, ask away in the Redpanda Community Slack

No items found.

Related articles

View all posts
Adam Szymański
,
,
&
Jul 17, 2024

Our road to improving Oxla's results on ClickBench

Read more
Text Link
Matt Schumpert
,
Brandon Allard
,
Bharath Vissapragada
&
Nicolae Vartolomei
Jul 16, 2024

Write caching: drive your workloads up to 90% faster

Leave compromises in the dust with flexible per-topic controls and predictable durability guarantees

Read more
Text Link
Adam Szymański
,
,
&
Aug 7, 2023

Oxla efficiency on Star Schema Benchmark

Read more
Text Link
PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.