High-performance data streams: no longer a “pipe dream” for Lacework

Harnessing data streams to deliver better application security in the cloud

By
on
June 6, 2023

Lacework, a leader in cloud security services, offers a data-driven platform for application security at scale in the cloud. The Lacework platform provides security throughout the application lifecycle by collecting and analyzing data from customer environments, and informing customers of potential issues quickly.

“When we say our platform secures applications in the cloud throughout the entire lifecycle, we really mean the full life cycle,” says Chip Turner, Engineering Director at Lacework. “Every aspect of the Lacework platform depends on real-time data flow to protect our customers. From the start, we analyze the code that builds infrastructure, catching violations, and providing customers with actionable guidance. During build, we integrate with CI/CD pipelines and registries to scan containers and host images, and block vulnerable, risky images before production. And in production, we continuously monitor our customers' entire cloud infrastructure for unusual and malicious activity that could indicate an attack or issue that should be remediated quickly.”

To provide this end-to-end application lifecycle protection, Lacework collects and processes data from a wide variety of sources, with diverse format, schema, and fidelity. These include logs and monitoring data from hundreds of thousands Lacework agents running in its customers’ cloud infrastructure. And to deliver security insights back to those customers, Lacework relies on Redpanda, the Apache Kafka®-compatible streaming data platform for developers, as a hub of data exchange between its different analytical systems and microservices.

Finding a streaming data solution for cloud-native 1GBps+ performance

Before adopting Redpanda, Lacework used a different proprietary streaming data solution to build its application protection platform. But as Lacework released new capabilities, acquired more customers in a growing number of markets, and expanded to new clouds like Google Cloud, it needed a more scalable, reliable, and efficient solution to deliver high performance for its rapidly growing 1GBps+ streaming data workloads, without incurring high costs.

In late 2021, as the Lacework team considered its options for a new, cloud-agnostic streaming data solution, the most critical factor was finding a platform that could handle high throughput peak loads with predictable latency while meeting the organization’s stringent SLOs and availability requirements. These internal KPIs are necessary to ensure security efficacy for Lacework’s customers.

“The data volumes we collect from customers are highly variable by the hour — in fact, 10x spikes are quite common,” says Turner. “And we must isolate each customers’ data to ensure timely collection of data and insights. Amidst these challenges, our ingestion pipeline is expected to rapidly deliver data to multiple destinations such as the data warehouse, long-term storage, and downstream pipelines. So our streaming solution has to optimize for both throughput and latency.”

After running four POCs with different streaming solutions, some well-known and others newer to market, the Lacework team found that only Redpanda could excel in all the necessary dimensions of scalability, reliability and efficiency.

“We conducted a thorough evaluation of streaming data options and Redpanda was the clear winner,” says Turner. “In fact, during our benchmarking we hit the limitations of our existing benchmarking tool, but Redpanda was barely breaking a sweat. Now, Lacework can easily run 14.5 GB/sec of data through Redpanda at peak loads. We’re really seeing the benefit of Redpanda’s performance architecture, the fact that it’s written in C++ and does intelligent memory handling. It makes a huge difference.”

The Lacework team was relatively new to the Kafka APIs. Fortunately, Redpanda offered a much simpler and easier deployment, using a single binary and with no dependencies like JVM or Apache ZooKeeper™.

“While our engineering teams were ramping on the Kafka APIs, we knew the operational challenges of building core infrastructure on top of Java-based dependencies and legacy technologies would add unnecessary complexity and risk,” says Turner. “Redpanda was clearly designed both with modern developers and modern platform teams in mind. The single binary has everything you need to get to production, fast.”

Additionally, Lacework is a multi-cloud organization and needed a solution with flexible deployment options that wouldn’t lock them into a single vendor or cloud provider. Having a cloud-agnostic streaming data platform was critical for Lacework's future growth.

Building the streaming “pipe dream” with Redpanda

As its first use case for the new streaming data platform, the Lacework team built its production data injection stack around Redpanda — an architecture they affectionately dubbed “pipe dream,” because it, “looked too good to be true.”

In the new architecture, multiple Redpanda clusters ensure high availability (HA) and disaster recovery; if a cluster were to ever go down, Lacework can seamlessly direct traffic away from a faulty cluster to a healthy cluster.

A pipeline configuration service manages the Redpanda topic lifecycle — creating and migrating topics, scaling up and down partition counts, setting appropriate replication factors, and balancing topics across Redpanda clusters. This service also describes the overall topology for pipelines, from producers to consumers.

In turn, a pipeline scheduler leases pipelines (i.e. topics/partitions) to physical workers, ensuring topics are efficiently consumed into databases, sinks such as S3 for long-term storage, or Redpanda topics as the start of other pipelines.

The Lacework architecture
The Lacework architecture

The new Lacework system is highly efficient; it estimates the resource requirements to process each partition, schedules tasks to the appropriate worker to maximize utilization (e.g. packaging multiple small topics together into single node or reserving larger nodes for demanding high throughput tasks), and reallocates pipelines to alternative workers if a worker is unhealthy.

“The Lacework solution built on Redpanda has solved a number of issues we faced with Kafka consumer groups and automatic rebalancing,” says Turner. “It improves resource utilization and achieves a high degree of fault isolation, even between partitions of the same topic.”

If a cluster is in an unhealthy state, Lacework’s strict SLO and HA requirements mean they need to shift traffic to another cluster immediately, ideally with minimal disruption, sometimes for hundreds or thousands of topics. Manually migrating topics from one cluster to another is error prone and tedious. With the new architecture, Lacework has developed virtual topics abstraction — a set of metadata describing physical topics in a physical cluster over time. This abstraction allows them to automatically migrate topics regardless of consumer status or lag, from one cluster to another with minimal hassle or production impact at scale.

“Our virtual Redpanda topics provide the illusion of a single topic on a single cluster, but can live in multiple clusters over time,” says Turner. “This allows us to more aggressively scale up topics to meet temporary increases in demand and just as easily scale down. With Redpanda and our new architecture, we have new levels of operational agility that are game-changing, so we can meet our stringent SLOs while keeping our developers focused on high-value work versus managing data infrastructure.”

How virtual Redpanda topics enable operational agility
How virtual Redpanda topics enable operational agility

Learn more about the Lacework architecture in this Redpanda user conference talk.

Looking to the future: tiered storage, K8s, and beyond

Since starting on Redpanda in December 2021, Lacework has grown its CPU cores footprint more than 1,200% and is now using Redpanda to host all of its streaming data.

Lacework and Redpanda by the numbers
Lacework and Redpanda by the numbers

However, this is only the beginning of Lacework’s journey with Redpanda. Now that the organization’s data pipelines are running hyper-efficiently with Redpanda, the team’s next priority is to leverage Redpanda’s Tiered Storage to help reduce storage consumption on expensive EC2 nodes.

“With Redpanda’s tiered storage capability, we can save up to 30% or more of our storage costs,” says Turner. “These are the innovations that sold us on Redpanda as a partner for streaming data — they’re helping us control costs as we grow usage, which means they’re invested in us for the long run.”

As Lacework embarks on an infrastructure modernization project in 2023 to migrate to a managed Kubernetes environment with Amazon EKS, they will be taking Redpanda along for the ride.

“Today, we are using storage-optimized i3en AWS instances, run as bare metal self-managed EC2,” says Turner. “To improve our infrastructure efficiency and operability, we are planning a move to EKS in 2023. The team loves that Redpanda doesn’t care what infrastructure it’s running on and can be run with tooling and runbooks consistent with the rest of our infrastructure.”

For more stories about how Redpanda has helped companies stream data faster at lower cost, check out our customers page and browse the Redpanda Blog for examples and tutorials. If you have questions for the team, join the Redpanda Community on Slack and ask away.

Graphic for downloading streaming data report
Redpanda Connect for Cloud
Christina Lin
&
&
&
September 12, 2024
Text Link
New AI connectors and GPU runtime support for AI models
Tyler Rockwood
&
&
&
September 12, 2024
Text Link
Cloud Topics: Efficiently stream data through object storage
Noah Watkins
&
Matt Schumpert
&
&
September 12, 2024
Text Link