
Streamfest day 2: Smarter streaming in the cloud and the future of Kafka
Highlights from the second day of Redpanda Streamfest 2025
Stream events from Redpanda Connect into CyborgDB for confidential, real-time Enterprise AI workflows
Enterprise AI adoption faces a critical security gap. Organizations are streaming sensitive data like transaction logs, customer interactions, and proprietary metrics into vector databases for RAG and semantic search.
But here's the problem: traditional vector databases operate on vector embeddings in plaintext, creating a honeypot of concentrated organizational knowledge. A single breach can expose years of business intelligence, customer data, and trade secrets.
The stakes are especially high in regulated industries. Financial institutions processing millions of transactions, healthcare systems analyzing patient data, and government agencies handling classified information all need real-time AI capabilities. Yet current solutions force them to choose between innovation and compliance. Stream processing for AI often means exposing vectors that can be inverted to reconstruct original sensitive content.
Cyborg has partnered with Redpanda to solve this with a streaming pipeline that encrypts vectors before they're stored, enabling semantic search and RAG applications on encrypted data. No more plaintext embeddings sitting in databases waiting to be breached.
In this post, you'll learn how to add CyborgDB to your Redpanda Connect pipeline, enabling semantic search and RAG applications while keeping your vectors encrypted. We'll also highlight example use cases, security best practices, and how to deploy this powerful duo in production.
Think of Redpanda Connect as your Swiss Army knife for streaming data. It's a lightweight, Apache Kafka®-compatible streaming platform that moves data between systems without the operational overhead of traditional Kafka deployments. Teams love it because it starts in seconds (not minutes), uses 10x less memory, and comes with 300+ built-in connectors.
For AI workloads, Redpanda Connect shines at ingesting high-volume event streams — transaction logs, sensor data, or user interactions — and routing them to downstream processors.
CyborgDB is the first vector encryption proxy that keeps your embeddings encrypted during search operations. While traditional vector databases need embeddings in plaintext to perform similarity searches (creating a security nightmare), CyborgDB uses cryptographic techniques, including forward-privacy SHA3 hashing and AES-256 symmetric encryption, to search directly on encrypted vectors. (You can read more about CyborgDB’s encryption schemes.)
Rather than storing vectors directly, CyborgDB transforms your existing database infrastructure (PostgreSQL, Redis, or other supported backends) into an encrypted vector store. This means leveraging your existing database investments and operational expertise while adding encrypted vector search capabilities. Your vectors are encrypted client-side before being persisted to your chosen backend, ensuring they remain protected at rest, in transit, and during use.
Vector embeddings aren't just random numbers — they're compressed representations of your data that can be inverted to reconstruct the original content. In regulated industries like healthcare and finance, exposed embeddings mean compliance violations and breach notifications. CyborgDB eliminates this risk.
Note that while CyborgDB provides end-to-end encryption for stored vectors, the data flowing through Redpanda Connect itself follows standard streaming security practices. The encryption Cyborg provides kicks in when data is transformed into vectors and stored in CyborgDB.
The CyborgDB output in Redpanda Connect is ready and available in Cloud and Self-Managed deployments. Redpanda Connect handles the real-time data ingestion and transformation, while CyborgDB provides the encrypted vector storage and search. Together, they create a streaming AI pipeline that's both blazing fast and cryptographically secure.
Financial firms use this for real-time fraud detection, healthcare systems for patient monitoring, and retailers for instant personalization — all without exposing sensitive data in vector form.
First, set up CyborgDB using Docker:
# Pull and run CyborgDB with PostgreSQL backend
docker run -d -p 8000:8000 \
-e CYBORGDB_DB_TYPE=postgres \
-e CYBORGDB_CONNECTION_STRING="host=postgres port=5432 dbname=cyborgdb user=cyborgdb password=secure_password" \
-e CYBORGDB_API_KEY="your_cyborgdb_api_key" \
cyborginc/cyborgdb-service:latest
# Or with Redis backend
docker run -d -p 8000:8000 \
-e CYBORGDB_DB_TYPE=redis \
-e CYBORGDB_CONNECTION_STRING="host=redis,port=6379,db=0" \
-e CYBORGDB_API_KEY="your_cyborgdb_api_key" \
cyborginc/cyborgdb-service:latestGenerate a secure encryption key for your index:
# Quick start: Generate a 32-byte key and encode as base64
export CYBORGDB_INDEX_KEY=$(openssl rand -base64 32)
echo "Save this key securely: $CYBORGDB_INDEX_KEY"Important: For production deployments, we strongly recommend using a Key Management Service (KMS) instead of storing raw keys. Redpanda Connect supports integration with AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, and other KMS providers. See Redpanda's secrets management documentation for configuration details.
Add CyborgDB to your existing Redpanda Connect pipeline. Here's a complete example showing how to stream data with embeddings into encrypted storage:
# Your existing input and processors remain unchanged
input:
kafka:
addresses: ["localhost:9092"]
topics: ["your_topic"]
consumer_group: "your_consumer_group"
pipeline:
processors:
# Your existing processors...
- label: "generate_embedding"
# Your embedding generation logic
# Add CyborgDB as the output for encrypted vector storage
output:
cyborgdb:
host: "localhost:8000" # CyborgDB service endpoint
api_key: "${CYBORGDB_API_KEY}" # Your CyborgDB API key
index_name: "production_vectors" # Name for your encrypted index
index_key: "${CYBORGDB_INDEX_KEY}" # Base64-encoded 32-byte encryption key
create_if_missing: true # Auto-create index on first write
operation: "upsert" # upsert or delete
# Map your document ID
id: "${! json(\"id\") }" # Extract ID from your message
# Map your embedding vector
vector_mapping: "root = this.embedding" # Path to embedding array
# Optional: Include metadata for filtering
metadata_mapping: |
root = {
"timestamp": this.timestamp,
"category": this.category,
"user_id": this.user_id
}
# Batching for optimal performance
batching:
count: 100 # Batch size
period: "1s" # Max wait timeEssential settings:
host: Your CyborgDB service endpointapi_key: Authentication key from cyborg.coindex_name: Unique name for your encrypted vector collectionindex_key: Base64-encoded 32-byte encryption key (use the Redpanda Secrets Guide)Data mappings:
id: Unique identifier for each vector (required)vector_mapping: Bloblang expression to extract the embedding arraymetadata_mapping: Optional metadata for filtering during searchPerformance tuning:
batching.count: Number of vectors to batch (100-500 recommended)batching.period: Maximum time to wait before sending a partial batchFraud detection pipeline:
output:
cyborgdb:
host: "${CYBORGDB_HOST}"
api_key: "${CYBORGDB_API_KEY}"
index_name: "fraud_patterns"
index_key: "${FRAUD_INDEX_KEY}"
operation: "upsert"
id: "${! json(\"transaction_id\") }"
vector_mapping: "root = this.transaction_embedding"
metadata_mapping: |
root = {
"amount": this.amount,
"merchant_category": this.merchant_category,
"risk_score": this.risk_score
}RAG document pipeline:
output:
cyborgdb:
host: "${CYBORGDB_HOST}"
api_key: "${CYBORGDB_API_KEY}"
index_name: "knowledge_base"
index_key: "${KB_INDEX_KEY}"
operation: "upsert"
id: "${! json(\"doc_id\") }"
vector_mapping: "root = this.content_embedding"
metadata_mapping: |
root = {
"source": this.source,
"department": this.department,
"last_updated": this.timestamp,
"access_level": this.access_level
}Cyborg sees these numbers in production deployments:
Key management:
Network security:
Compliance benefits:
For production environments, you can deploy both services together:
version: '3.8'
services:
cyborgdb:
image: cyborginc/cyborgdb-service:latest
ports:
- "8000:8000"
environment:
- CYBORGDB_DB_TYPE=postgres
- CYBORGDB_CONNECTION_STRING=host=postgres port=5432 dbname=cyborgdb user=cyborgdb password=${DB_PASSWORD}
- CYBORGDB_API_KEY=${CYBORGDB_API_KEY}
- SSL_CERT_PATH=/certs/server.crt # For HTTPS
- SSL_KEY_PATH=/certs/server.key
volumes:
- ./certs:/certs
depends_on:
- postgres
postgres:
image: postgres:15
environment:
- POSTGRES_DB=cyborgdb
- POSTGRES_USER=cyborgdb
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
redpanda-connect:
image: redpandadata/connect:latest
volumes:
- ./pipeline.yaml:/pipeline.yaml
command: run /pipeline.yaml
environment:
- CYBORGDB_API_KEY=${CYBORGDB_API_KEY}
- CYBORGDB_INDEX_KEY=${CYBORGDB_INDEX_KEY}
depends_on:
- cyborgdb
volumes:
postgres_data:Cyborg and Redpanda have created a streaming pipeline that solves a critical enterprise need: real-time AI that keeps your vectors encrypted even during search operations. By adding CyborgDB to your Redpanda Connect pipeline, you can finally deploy AI in regulated environments without compromising on security or performance.
The integration is straightforward: add the CyborgDB output to your existing pipeline configuration, generate an encryption key, and your vectors are automatically encrypted before storage and in use. Your compliance team gets the security they need, your engineering team gets a simple integration, and your data scientists get the real-time AI capabilities they want.
Ready to secure your streaming AI pipeline? Join the Redpanda Community Slack to discuss your use case, or get your CyborgDB API key to start building. Questions about compliance or enterprise features? Contact the Cyborg team.
Chat with our team, ask industry experts, and meet fellow data streaming enthusiasts.

Highlights from the second day of Redpanda Streamfest 2025

Highlights from the first day of Redpanda Streamfest 2025

Cost-effective Cloud Topics, Google Cloud BigLake Iceberg catalogs, and SQL Server CDC
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.