
Apache SamzaThe Apache Software Foundation
Apache Samza is a distributed stream processing framework that enables stateful applications to process real-time data from multiple sources with low latency and high throughput. It supports flexible deployment and integrates with systems like Kafka, HDFS, and cloud services.
Vendor
The Apache Software Foundation
Company Website



Product details
Apache Samza
Apache Samza is a distributed stream processing framework designed for building stateful applications that process real-time data from multiple sources. It is battle-tested at scale and supports flexible deployment options, including running on YARN, Kubernetes, or as an embedded library. Samza integrates seamlessly with systems like Apache Kafka, HDFS, AWS Kinesis, and Azure EventHubs.
Features
- High-performance stream processing with low latency and high throughput.
- Horizontal scalability with support for terabytes of state and thousands of cores.
- Rich APIs including Streams DSL, Samza SQL, Apache Beam, and low-level task APIs.
- Unified API for both batch and streaming data.
- Pluggable architecture for integrating with various data sources and sinks.
- Flexible deployment: standalone, embedded, or managed via cluster managers.
- Fault-tolerant with host-affinity and incremental checkpointing.
- Asynchronous processing for high-throughput remote I/O.
Capabilities
- Real-time data processing from multiple sources with guaranteed at-least-once delivery.
- Stateful stream processing using scalable, fault-tolerant local state stores.
- Stream partitioning and parallel task execution for efficient scaling.
- Event-time and processing-time semantics for accurate time-based operations.
- Dynamic task migration and recovery using changelogs and host-affinity.
- Embedded library mode for lightweight integration into existing applications.
- Managed service mode for large-scale deployments using YARN or Kubernetes.
Benefits
- Scalability and reliability proven in production by companies like LinkedIn, Uber, and Slack.
- Flexibility to run in diverse environments from cloud to bare-metal.
- Efficient resource usage with incremental state flushing and local storage.
- Simplified development with declarative and imperative APIs.
- Resilience to failures with fast recovery and minimal downtime.
- Open-source and community-driven under the Apache Software Foundation.
Find more products by industry
Other ServicesEducationFinance & InsuranceHealth & Social WorkPublic AdministrationInformation & CommunicationView all