Cyborg and Redpanda: Secure streaming pipelines for enterprise AI

Stream events from Redpanda Connect into CyborgDB for confidential, real-time Enterprise AI workflows

by

October 14, 2025

Last modified on

TL;DR Takeaways:

No items found.

Learn more at Redpanda University

Enterprise AI adoption faces a critical security gap. Organizations are streaming sensitive data like transaction logs, customer interactions, and proprietary metrics into vector databases for RAG and semantic search.

But here's the problem: traditional vector databases operate on vector embeddings in plaintext, creating a honeypot of concentrated organizational knowledge. A single breach can expose years of business intelligence, customer data, and trade secrets.

The stakes are especially high in regulated industries. Financial institutions processing millions of transactions, healthcare systems analyzing patient data, and government agencies handling classified information all need real-time AI capabilities. Yet current solutions force them to choose between innovation and compliance. Stream processing for AI often means exposing vectors that can be inverted to reconstruct original sensitive content.

Cyborg has partnered with Redpanda to solve this with a streaming pipeline that encrypts vectors before they're stored, enabling semantic search and RAG applications on encrypted data. No more plaintext embeddings sitting in databases waiting to be breached.

In this post, you'll learn how to add CyborgDB to your Redpanda Connect pipeline, enabling semantic search and RAG applications while keeping your vectors encrypted. We'll also highlight example use cases, security best practices, and how to deploy this powerful duo in production.

The technologies

Redpanda Connect

Think of Redpanda Connect as your Swiss Army knife for streaming data. It's a lightweight, Apache Kafka^®-compatible streaming platform that moves data between systems without the operational overhead of traditional Kafka deployments. Teams love it because it starts in seconds (not minutes), uses 10x less memory, and comes with 300+ built-in connectors.

For AI workloads, Redpanda Connect shines at ingesting high-volume event streams — transaction logs, sensor data, or user interactions — and routing them to downstream processors.

CyborgDB

CyborgDB is the first vector encryption proxy that keeps your embeddings encrypted during search operations. While traditional vector databases need embeddings in plaintext to perform similarity searches (creating a security nightmare), CyborgDB uses cryptographic techniques, including forward-privacy SHA3 hashing and AES-256 symmetric encryption, to search directly on encrypted vectors. (You can read more about CyborgDB’s encryption schemes.)

Rather than storing vectors directly, CyborgDB transforms your existing database infrastructure (PostgreSQL, Redis, or other supported backends) into an encrypted vector store. This means leveraging your existing database investments and operational expertise while adding encrypted vector search capabilities. Your vectors are encrypted client-side before being persisted to your chosen backend, ensuring they remain protected at rest, in transit, and during use.

Vector embeddings aren't just random numbers — they're compressed representations of your data that can be inverted to reconstruct the original content. In regulated industries like healthcare and finance, exposed embeddings mean compliance violations and breach notifications. CyborgDB eliminates this risk.

Note that while CyborgDB provides end-to-end encryption for stored vectors, the data flowing through Redpanda Connect itself follows standard streaming security practices. The encryption Cyborg provides kicks in when data is transformed into vectors and stored in CyborgDB.

CyborgDB + Redpanda = speed and security

The CyborgDB output in Redpanda Connect is ready and available in Cloud and Self-Managed deployments. Redpanda Connect handles the real-time data ingestion and transformation, while CyborgDB provides the encrypted vector storage and search. Together, they create a streaming AI pipeline that's both blazing fast and cryptographically secure.

Financial firms use this for real-time fraud detection, healthcare systems for patient monitoring, and retailers for instant personalization — all without exposing sensitive data in vector form.

How to add CyborgDB to your pipeline

Step 1. Install CyborgDB

First, set up CyborgDB using Docker:

# Pull and run CyborgDB with PostgreSQL backend
docker run -d -p 8000:8000 \
  -e CYBORGDB_DB_TYPE=postgres \
  -e CYBORGDB_CONNECTION_STRING="host=postgres port=5432 dbname=cyborgdb user=cyborgdb password=secure_password" \
  -e CYBORGDB_API_KEY="your_cyborgdb_api_key" \
  cyborginc/cyborgdb-service:latest

# Or with Redis backend
docker run -d -p 8000:8000 \
  -e CYBORGDB_DB_TYPE=redis \
  -e CYBORGDB_CONNECTION_STRING="host=redis,port=6379,db=0" \
  -e CYBORGDB_API_KEY="your_cyborgdb_api_key" \
  cyborginc/cyborgdb-service:latest

Generate a secure encryption key for your index:

# Quick start: Generate a 32-byte key and encode as base64
export CYBORGDB_INDEX_KEY=$(openssl rand -base64 32)
echo "Save this key securely: $CYBORGDB_INDEX_KEY"

Important: For production deployments, we strongly recommend using a Key Management Service (KMS) instead of storing raw keys. Redpanda Connect supports integration with AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, and other KMS providers. See Redpanda's secrets management documentation for configuration details.

Step 2. Configure the CyborgDB output

Add CyborgDB to your existing Redpanda Connect pipeline. Here's a complete example showing how to stream data with embeddings into encrypted storage:

# Your existing input and processors remain unchanged
input:
  kafka:
    addresses: ["localhost:9092"]
    topics: ["your_topic"]
    consumer_group: "your_consumer_group"

pipeline:
  processors:
    # Your existing processors...
    - label: "generate_embedding"
      # Your embedding generation logic

# Add CyborgDB as the output for encrypted vector storage
output:
  cyborgdb:
    host: "localhost:8000"                      # CyborgDB service endpoint
    api_key: "${CYBORGDB_API_KEY}"             # Your CyborgDB API key
    index_name: "production_vectors"            # Name for your encrypted index
    index_key: "${CYBORGDB_INDEX_KEY}"         # Base64-encoded 32-byte encryption key
    create_if_missing: true                     # Auto-create index on first write
    operation: "upsert"                         # upsert or delete
    
    # Map your document ID
    id: "${! json(\"id\") }"                   # Extract ID from your message
    
    # Map your embedding vector
    vector_mapping: "root = this.embedding"     # Path to embedding array
    
    # Optional: Include metadata for filtering
    metadata_mapping: |
      root = {
        "timestamp": this.timestamp,
        "category": this.category,
        "user_id": this.user_id
      }
    
    # Batching for optimal performance
    batching:
      count: 100        # Batch size
      period: "1s"      # Max wait time

Configuration options explained

Essential settings:

host: Your CyborgDB service endpoint
api_key: Authentication key from cyborg.co
index_name: Unique name for your encrypted vector collection
index_key: Base64-encoded 32-byte encryption key (use the Redpanda Secrets Guide)

Data mappings:

id: Unique identifier for each vector (required)
vector_mapping: Bloblang expression to extract the embedding array
metadata_mapping: Optional metadata for filtering during search

Performance tuning:

batching.count: Number of vectors to batch (100-500 recommended)
batching.period: Maximum time to wait before sending a partial batch

Use case examples

‍Fraud detection pipeline:

output:
  cyborgdb:
    host: "${CYBORGDB_HOST}"
    api_key: "${CYBORGDB_API_KEY}"
    index_name: "fraud_patterns"
    index_key: "${FRAUD_INDEX_KEY}"
    operation: "upsert"
    id: "${! json(\"transaction_id\") }"
    vector_mapping: "root = this.transaction_embedding"
    metadata_mapping: |
      root = {
        "amount": this.amount,
        "merchant_category": this.merchant_category,
        "risk_score": this.risk_score
      }

RAG document pipeline:

output:
  cyborgdb:
    host: "${CYBORGDB_HOST}"
    api_key: "${CYBORGDB_API_KEY}"
    index_name: "knowledge_base"
    index_key: "${KB_INDEX_KEY}"
    operation: "upsert"
    id: "${! json(\"doc_id\") }"
    vector_mapping: "root = this.content_embedding"
    metadata_mapping: |
      root = {
        "source": this.source,
        "department": this.department,
        "last_updated": this.timestamp,
        "access_level": this.access_level
      }

Performance and security considerations

Performance metrics

Cyborg sees these numbers in production deployments:

Throughput: 50,000+ vectors/second with proper batching
Encryption overhead: <1% latency increase vs. plaintext storage
Search latency: Sub-10ms for similarity search on millions of encrypted vectors
Index size: ~1.2x the size of unencrypted index with same configuration parameters

Security best practices

Key management:

Generate unique 32-byte keys for each index
Store keys in secure key management systems (AWS KMS, HashiCorp Vault, Azure Key Vault)
Never commit keys to version control
Implement key rotation policies for long-lived indexes

Network security:

Use TLS for all connections to CyborgDB
Deploy CyborgDB within your VPC/private network
Implement API key rotation schedules

Compliance benefits:

Vectors remain encrypted at rest in the database
Vectors remain encrypted during search operations
No plaintext exposure in logs, caches, or memory dumps
Meets HIPAA, GDPR, and SOC 2 requirements for data encryption

Production deployment with Docker Compose

For production environments, you can deploy both services together:

version: '3.8'
services:
  cyborgdb:
    image: cyborginc/cyborgdb-service:latest
    ports:
      - "8000:8000"
    environment:
      - CYBORGDB_DB_TYPE=postgres
      - CYBORGDB_CONNECTION_STRING=host=postgres port=5432 dbname=cyborgdb user=cyborgdb password=${DB_PASSWORD}
      - CYBORGDB_API_KEY=${CYBORGDB_API_KEY}
      - SSL_CERT_PATH=/certs/server.crt  # For HTTPS
      - SSL_KEY_PATH=/certs/server.key
    volumes:
      - ./certs:/certs
    depends_on:
      - postgres
  
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=cyborgdb
      - POSTGRES_USER=cyborgdb
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redpanda-connect:
    image: redpandadata/connect:latest
    volumes:
      - ./pipeline.yaml:/pipeline.yaml
    command: run /pipeline.yaml
    environment:
      - CYBORGDB_API_KEY=${CYBORGDB_API_KEY}
      - CYBORGDB_INDEX_KEY=${CYBORGDB_INDEX_KEY}
    depends_on:
      - cyborgdb

volumes:
  postgres_data:

Build secure pipelines with CyborgDB in Redpanda Connect

Cyborg and Redpanda have created a streaming pipeline that solves a critical enterprise need: real-time AI that keeps your vectors encrypted even during search operations. By adding CyborgDB to your Redpanda Connect pipeline, you can finally deploy AI in regulated environments without compromising on security or performance.

The integration is straightforward: add the CyborgDB output to your existing pipeline configuration, generate an encryption key, and your vectors are automatically encrypted before storage and in use. Your compliance team gets the security they need, your engineering team gets a simple integration, and your data scientists get the real-time AI capabilities they want.

Ready to secure your streaming AI pipeline? Join the Redpanda Community Slack to discuss your use case, or get your CyborgDB API key to start building. Questions about compliance or enterprise features? Contact the Cyborg team.

Resources

No items found.

Join the Redpanda Community on Slack

Chat with our team, ask industry experts, and meet fellow data streaming enthusiasts.

FEATURED RESOURCE

Table of contents

Graphic for Redpanda Streamfest 2025

Related articles

Prakhar Garg

,

,

&

Mar 5, 2026

Introducing Iceberg output for Redpanda Connect

From any source to any schema — lakehouse ingestion made simple (and boring)

Read more

Towfiqa Yasmeen

,

Mike Broberg

,

&

Feb 3, 2026

Redpanda Serverless now Generally Available

Zero-ops simplicity meets enterprise-grade security to unlock production-ready data streaming for builders

Read more

Jake Cahill

,

,

&

Jan 29, 2026

Bloblang playground just got smarter

Smarter autocomplete, dynamic metadata detection, and swifter collaboration

Read more

PANDA MAIL

Stay in the loop

Subscribe to our VIP (very important panda) mailing list to pounce on the latest blogs, surprise announcements, and community events!
Opt out anytime.