
NOTE: Redpanda Connect now has a Snowflake connector with 2X higher throughput than Kafka Connect. Updated tutorial coming soon! In the meantime, check out our new Snowflake connector.
We are in the era of cloud computing. While scalability, high availability, and other nonfunctional requirements can be obtained successfully in the cloud, the operational burden might still be a problem. Using your own middleware or your own relational database in the cloud might require heavy operational efforts that consume a lot of energy and introduce errors into the process.
As a software-as-a-service (SaaS) data warehouse, Snowflake provides solutions to data processing and analytics that are faster, easier to use, and more flexible than traditional offerings, and it integrates well with Apache Kafka®.
You can ingest data into Snowflake in multiple ways. For instance, you could use Kafka to create a data pipeline for user-behavior data from a platform and run analytics on a Snowflake database in real time. Or you could capture data changes from a platform, produce them through Kafka, and save them as time-stamped archive data.
In both examples, you would need a fast, reliable data-streaming platform and a flexible, easy-to-manage platform for data storage and analytics.
Redpanda is a Kafka-compatible streaming data platform that is dramatically simpler to use, with 10x faster (and stable) average latencies, and is safe by default. So when building data pipelines for mission critical workloads, particularly when you’re moving towards real-time data processing, pairing Redpanda with Snowflake is advantageous for these types of use cases.
In this tutorial, you will learn how to do the following:
- Create a Snowflake database and a table for data-archiving purposes
- Set up and run a Redpanda cluster and create topics for Kafka Connect usage
- Configure and run a Kafka Connect cluster for Redpanda and Snowflake integration
You can follow along with the tutorial using this repository.
Prerequisites #
You’ll need the following for this tutorial:
- A Snowflake account and the SnowSQL CLI installed (follow these instructions)
- A machine to install Redpanda and Kafka Connect (this tutorial uses Linux, but you can run Redpanda on Docker, MacOS and Windows, as well)
- Java 11 to run the producer application
Use case: connecting Redpanda to Snowflake with Kafka Connect #
PandaBooks LLC is a fictional bookstore that has two main branches worldwide, London and New York. In my previous Kafka Connect tutorial, you created an inventory-feeding mechanism for each branch.
Now you need to archive changes in the main inventory file, which is in CSV format. While the inventory-distributor mechanism sends the data to two different branch inventories, it must also send any data change to the archive. The company wants to keep the archive in Istanbul and use Snowflake as the archive database.
The following image is a high-level diagram of PandaBooks LLC, including the new requirements highlighted:

If you have already gone through the previous tutorial, you can skip some steps, such as setting up Redpanda or creating a Kafka Connect cluster. If this is your first time in this series, you must follow all the steps.
Setting up Redpanda #
On a Linux environment, run the following commands to install Redpanda as a system service:
On Fedora/Red Hat systems:
Run the setup script to download and install the repo
Use yum to install redpanda
On Debian/Ubuntu systems:
Run the setup script to download and install the repo
Use apt to install redpanda
Start the Redpanda cluster with the following command:
You can verify if Redpanda is up and running:
The output should look like the following:
In this tutorial, you’ll create a single-node Redpanda cluster that is not suitable for a production environment.
To enable the inventory-distributor
application and the Kafka Connect cluster to work properly, you must define the Kafka topic for the Istanbul archive. You can use Redpanda Keeper (rpk
), Redpanda’s command-line interface (CLI) to create topics on the Redpanda cluster.
You can access the CLI by running the rpk
command. (View the rpk documentation for more available commands.)
Run the following command to create a topic for the Istanbul archive:
Verify that you have the topics created:
This should return the following output:
Notice that you do not specify a partition count or replication factor for the topics. For a production environment, you would need to create topics with suitable configuration for the cluster structure and the designed architecture.
Setting up Snowflake #
In your browser, log in to your Snowflake account and click Organization on the left menu to get your generated organization name and account name.
In your terminal window, log in to your Snowflake account by using SnowSQL.
After entering your prompted password, you ought to see this output:
Create a database called inventory_archive
in the public schema:
Create a table called pandabooks_istanbul
in the inventory_archive
database.
The table must have record_metadata
and record_content
with the type variant
because it is the compatible format with the JSON type of the Snowflake connector.
Setting up the key pair authentication
The Snowflake Kafka connector uses key pair authentication, so you must define a private and public key pair and configure them for your user. Optionally, you can use another user with different privileges.
For more information, refer to the documentation on configuring the Kafka connector.
In your terminal, run the following command to create a private key:
Enter 1234567 for the passphrase when it is prompted.
In the same directory, create a public key with the following command:
Print the public key in the single line format:
Copy the public key without the header and footer that begins and ends with -----, and set it as the public key of your Snowflake user. To do this, open your SnowSQL session and run the following command to alter your user:
You can describe your user to validate that it has the updated key.
You’ll need your private key while setting up the Snowflake connector.
Setting up Kafka Connect #
Kafka Connect is an integration tool released with the Apache Kafka project. It provides reliable data streaming between Apache Kafka and external systems. It is both scalable and flexible, and you can use it to integrate with any system, including databases, search indexes, and cloud storage providers.
Redpanda is fully compatible with the Kafka API. This means that Redpanda gives you seamless integration with the Kafka ecosystem so your existing investments work right out of the box.
Kafka Connect uses source and sink connectors for integration. Source connectors stream data from an external system to Kafka. Sink connectors stream data from Kafka to an external system.

To get Kafka Connect, you must download the Apache Kafka package. Navigate to the Apache downloads page for Kafka, and click the suggested download link for the Kafka 3.1.0 binary package.
Create a folder called pandabooks_integration
in your home directory, and extract the Kafka binaries file in this directory. You can use the following commands by changing the paths if necessary.
Configuring the Connect cluster
To run a Kafka Connect cluster, set up a configuration file in the properties format.
In the pandabooks_integration
folder, create a folder called configuration
. Create a file in this directory called connect.properties
, and add the following content:
Set the bootstrap.servers
value to localhost:9092
. This configures the Connect cluster to use the Redpanda cluster.
Configure the plugin.path
, which you’ll use to put the connector binaries. Create a folder called plugins
in the pandabooks_integration
directory. In the plugins
directory, create a folder called snowflake
. You’ll add the connector binaries in the plugins/snowflake
directory.
Download the binaries by running the following command set:
Copy the binaries into the plugins
directory:
The final folder structure for pandabooks_integration
will be as follows:
Change the plugin.path
value to /home/YOUR_USER_NAME/pandabooks_integration/plugins
. This configures the Connect cluster to use the Redpanda cluster.
The final connect.properties
file will be as follows:
Configuring the connectors
Setting up the connector plugins in a Kafka Connect cluster for achieving integration with external systems is not enough. The cluster needs connectors configured to execute the integration, so next you’ll configure the sink connectors for Snowflake.
Create a file called snowflake-sink-connector.properties
in the ~/pandabooks_integration/configuration
directory with the following content:
Some of the values are already set, but some are left blank. Set the following values for the keys in the snowflake-sink-connector.properties
file:
KeyValueconnector.classcom.snowflake.kafka.connector.SnowflakeSinkConnectorvalue.convertercom.snowflake.kafka.connector.records.SnowflakeJsonConvertertopicsistanbul-archivesnowflake.topic2table.mapistanbul-archive:pandabooks_istanbulsnowflake.url.name_YOURORGANIZATION-_YOUR_ACCOUNTNAME.snowflakecomputing.com:443snowflake.user.name_YOUR_SNOWFLAKEUSERsnowflake.private.key_YOUR_SNOWFLAKE_PRIVATEKEYsnowflake.database.nameinventory_archive
To get _YOUR_SNOWFLAKE_PRIVATEKEY
, navigate to the directory of your private key and print out your private key in the single line format. You must set it in the properties
file as a value for snowflake.private.key
.
Running the Kafka Connect cluster
To run the cluster with the configurations that you applied, open a new terminal window and navigate to the ~/pandabooks_integration/configuration
directory.
Run the following command in the directory:
The output should look like the following:
Note that you’re running the Kafka Connect cluster in standalone mode. Kafka Connect also has a distributed mode that accepts connector configurations via its REST API. The standalone mode fits the proposed architecture better because you are working on a lightweight integration scenario.
For more information about distributed mode, check the running Kafka Connect section of the Kafka documentation.
Running the inventory distributor #
Download the inventory-distributor
application binary by clicking this link in the ~/pandabooks_integration
directory.
Download the main inventory file book-inventory.csv
in the ~/pandabooks_integration
directory with the following command:
Check the book-inventory.cs
file. It has five book records to be sent to the london
and newyork
inventories. The inventory-distributor
application distributes book data to these inventories and sends any change on these records to the Istanbul archive.
Run the application with the following command in a new terminal window and keep it running:
You should see the following output:
The inventory-distributor
application scans the changes in the book-inventory.csv
file continuously and sends them to the relevant Kafka topic. The Snowflake Sink Kafka Connect connector consumes the book data from the istanbul-archive
topic and saves it to Snowflake for archiving.
In your Snowflake session, run a select query to check if the content is saved in the Snowflake database. Sending the data to Snowflake might take some time because of the connector configuration, so you might need to run the query a few times until you see the listed data.
When you examine the contents of the book-inventory.csv
file, you realize that part of the book data is wrong. The book The Metamorphosis is in English, but the providers set it as German:
978-0553213690,The Metamorphosis,Franz Kafka,2009,Classix Press,German,newyork
Fix this record by replacing the German
field with English
, and save the file. Be sure that the inventory-distributor
application is still running.
Query the Snowflake table again to see if the change is archived:
Notice that both records belong to the same book data from the inventory, but the second corrected record has a bigger CreateTime
.
Finally, add a new book record in the book-inventory.csv
file for the London inventory. Add the following line in the book-inventory.csv
file:
978-1591846352,The Obstacle Is the Way,Ryan Holiday,2014,Portfolio,English,london
You should see the following output:
Congratulations! You have successfully set up an archival mechanism for the bookstore.
Conclusion #
In this tutorial, you created a Snowflake database and a table for keeping the bookstore’s inventory archive. You created a Redpanda cluster and a topic for the archive to use in your connector configurations. You also created and configured a Kafka Connect cluster to use Redpanda and configured a connector to stream book data to the Istanbul archive.
By using Kafka Connect, you can integrate Redpanda with Snowflake, which helps reduce the cost of operations and the operational burden while providing scaling with all the benefits of cloud. This can help a variety of businesses to better operate, store, and analyze their vital data.
You can find the entire code for this tutorial in this repository. You can also join Redpanda’s Slack community, and contribute to Redpanda’s GitHub.
Related articles
VIEW ALL POSTSLet's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.