Simple, fast, and scalable serverless stream processing with DeltaStream and Redpanda

Meet the dream team for all your streaming database and real-time analytics use cases

By
on
June 15, 2023

In Redpanda, you have a blazingly fast streaming data platform to source and transport your data. With DeltaStream, you can leverage a single platform for all your streaming database and real-time analytics use cases.

With features like RBAC, namespacing, and query isolation, DeltaStream not only simplifies stream processing but also brings battle-tested features like data hierarchy and data sharing from the relational database world into the streaming data world.

This post walks you through a credit card fraud detection example where you’ll learn how to connect your Redpanda cluster to DeltaStream and construct common modules for fraud detection by leveraging DeltaStream’s unique capabilities.

Set up Redpanda and DeltaStream

You can get everything set up in just two steps. Before you start, sign up for a DeltaStream free trial and log into the CloudUI. Then do the following to write your first query:

Step 1: Connect your Redpanda cluster to DeltaStream

Navigate to the Stores tab and click the + button on the bottom right. Here you can add the endpoints of your cluster. Click Next and add the “authentication” by choosing SHA 512 from the dropdown and entering your Redpanda username and password.

Connecting a Redpanda Cloud cluster to DeltaStream
Connecting a Redpanda Cloud cluster to DeltaStream

Now you have successfully connected your Redpanda cluster to DeltaStream! To verify, click on the store to expand the view and select the Topics tab. Here you should be able to see the topics on the cluster.

Once you connect the Redpanda cluster you should be able to see topics and print messages
Once you connect the Redpanda cluster you should be able to see topics and print messages

Step 2: Create a database

Navigate to the Database tab, give it a name, and hit Save when prompted. Now you are ready to process your data streams.

This is where you run all your stream processing queries
This is where you run all your stream processing queries

Note: With Databases in DeltaStream you can organize your streaming data in a hierarchical namespace. A database is a logical grouping of schemas and a schema is a logical grouping of relational objects such as Streams, Changelogs, Materialized Views, and Tables. This powerful feature opens up a lot of interesting capabilities - like separation of your environments (dev, pre-prod, and prod), isolation of workloads, etc.

Databases in DeltaStream are logical namespaces and provide workload isolation, security, and improve developer velocity
Databases in DeltaStream are logical namespaces and provide workload isolation, security, and improve developer velocity

Now, you can proceed to the SQL tab and start writing queries. Here’s the reference. There is no need for you to think about infrastructure, workload scoping, or cluster sizing. Just start writing your queries and DeltaStream will handle the rest.

Beyond stream processing and Materialized View, DeltaStream comes with some very exciting features and concepts that will elevate your ability to manage streaming data at every level. Here are a few:

  • RBAC: Securing and sharing event streams in real time has been a challenge with stream processing systems. With DeltaStream’s RBAC, these roles can be bound to “users” or “data sets” to control who has access and what is accessible. RBAC, along with the hierarchical namespaces, provides a completely new way of looking at event streams versus other stream processing solutions that employ flat namespacing.
  • Query Isolation: While using most stream processing solutions, jobs/queries are deployed within the boundaries of a cluster. This results in the “noisy neighbor” problem where multiple queries/jobs compete for commonly shared resources. But, with DeltaStream you can run your complex stream processing queries in complete isolation. This enables you to deploy new use cases rapidly and scale existing ones independently. This means less TTM (Time To Market) for new use cases and cost-savings with efficient use of underlying infrastructure.
  • Federated Kafka: De-silo your streaming infrastructure and work across multiple Redpanda clusters seamlessly. You don’t have to deploy an independent stream processing “cluster'' for every single Kafka cluster. Instead, you can connect your entire Kafka deployment (multiple clusters and environments) to DeltaStream and securely process all your streaming data in one place.

Now let’s take a look at a sample use case to demonstrate how you can manage, process, and share your event streams using Redpanda and DeltaStream.

Use case: credit card transaction processing for fraud detection, location-based promotions, and more!

Transactions are assessed based on various rules. As we develop this example, you can see how different streams of data are used to flag a transaction as fraudulent. This could range from simply flagging a fraudulent transaction based on the amount— to considering other factors like location, IP address, velocity/frequency of transactions, etc. All these different models can be implemented using Redpanda and DeltaStream.

Beyond fraud detection, you can share the transaction data, in real-time, with your marketing team to make in-store recommendations or send the data to relevant merchants so they can send information about ongoing promotions. This is extremely powerful and can open up a lot of opportunities to collaborate with various parties in real time with zeroELT.

Let’s start with fraud detection. Consider the following schema for a transaction event (see below for a sample event). Let’s see how we can use various operations to detect fraud.

{
  "transaction_id": "1234567890",
  "timestamp": "2023-03-17T12:30:00Z",
  "amount": 3724.99,
  "currency": "USD",
  "cardholder_name": "John Smith",
  "card_number": "************1234",
  "expiration_date": "06/25",
  "cvv": "***",
  "merchant_name": "Acme Clothing",
  "card_present": "NO",
  "merchant_location": "123 Main Street, Anytown USA",
}

Filtering

You can implement your fraud rules to flag incoming transactions in real-time by filtering them based on predetermined criteria. For example, here we are flagging transactions that are greater than 2000 where the card is not present.

SELECT userID, location, merchant FROM transactions WHERE amount  > 2000 && card_present = 'NO';

Once you run this query in DeltaStream, all the incoming credit card transactions that meet the predefined criteria from your Redpanda cluster get flagged. These flagged transactions can be sent to downstream systems for further processing if needed.

Similarly, you can perform windowed aggregations to track the frequency of transactions in a specific time period and JOINS (interval & temporal) to build more context for complex fraud detection scenarios.

Materialized views

DeltaStream supports an always up-to-date Materialized view. This can open up a lot of new capabilities. For example, you can maintain a Materialized view of blacklisted IP addresses/users and perform an in-flight look-up to block online transactions, if they are initiated from blacklisted entities. You can also share this data with third-party merchants in real time so they have the most up-to-date information on bad actors.

CREATE MATERIALIZED VIEW AS SELECT ip_address, merchant_name, merchant_location FROM flagged_transactions_topic;

Share data

Using DeltaStream’s RBAC, you can now share the fraud-related data with multiple teams within the organization— customer service team, mobile app team, or even with third-party users outside the organizations, such as merchants on your platform who can block certain transactions/users. All this in real time and with zeroETL.

For example, consider a scenario where all the flagged transactions from the previous steps are written to a flagged_transactions topic. You can now create a stream on this topic and securely share it with third-party users like payment processors.

CREATE STREAM flag_fraudStream WITH('store' = 'transactions_redPandaCluster') AS SELECT * FROM flagged_transactions; 

Once the stream is created, you can share by assigning the right role using DeltaStream’s RBAC.

GRANT USAGE, SELECT ON db1.public.flag_fraudStream TO ROLE payment_processors; 

A single pane of glass

Fraud detection use-cases often involve building context by processing multiple real-time event streams and possibly coming from different platforms. Using DeltaStream, you can seamlessly access event streams across different Redpanda clusters and centrally process them to provide a single pane of glass view over all your streaming data.

Conclusion

With Redpanda and DeltaStream you can now access, manage, and process your streaming data at scale! With this powerful duo, you can easily accelerate your journey to becoming an event-driven organization while optimizing for performance and keeping your costs low. Plus, it’s:

  • Simple to get started. Go from setup to building apps in a matter of hours.
  • Cost-efficient. Smaller infrastructure footprint and easy to operate and manage.
  • Complete. Provides end-to-end capabilities that are required to build a streaming platform.
  • Scalable, reliable, and secure.

You can start building on DeltaStream here! To keep exploring Redpanda, check the documentation and browse the Redpanda blog for tutorials. If you have any questions or want to chat with the team, join the Redpanda Community on Slack.

Graphic for downloading streaming data report
Building a crypto data hub with Rust
HG King
&
Daniel Honig
&
&
August 20, 2024
Text Link
BigQuery to Redpanda: continuous queries for real-time data integration
Praseed Balakrishnan
&
Jobin George
&
&
August 6, 2024
Text Link
ZooKeeper to KRaft migration: a brief overview and a simpler alternative
Dunith Danushka
&
&
&
July 30, 2024
Text Link