Logo
Sign in

Apache Apex is a native YARN big data-in-motion platform that unifies stream and batch processing. It enables scalable, fault-tolerant, and high-performance data processing on Hadoop, supporting real-time analytics and operational simplicity for enterprise-grade applications.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

ApplicationDeveloperGuide.html-image00.png
malhar-operators.png
ApplicationDeveloperGuide.html-image09.png
Product details

Apache Apex

Apache Apex is an enterprise-grade native YARN platform for big data-in-motion that unifies stream and batch processing. Designed to run on Hadoop, it offers high scalability, fault tolerance, and performance for real-time and batch analytics. Apex simplifies application development with a modular architecture and a low barrier to entry, making it ideal for building robust, distributed data processing systems.

Features

  • Native integration with Hadoop YARN and HDFS
  • Unified stream and batch processing engine
  • Event-time windowing and high-level API support
  • Simple Java-based API for application development
  • Malhar library with reusable operators and connectors
  • Built-in fault tolerance and state management
  • Scalable architecture with dynamic resource allocation
  • Support for exactly-once processing semantics

Capabilities

  • Real-time data ingestion and transformation
  • Batch analytics and scheduled processing
  • Operator lifecycle management and checkpointing
  • Elastic partitioning and parallel execution
  • Integration with messaging systems (Kafka, JMS, etc.)
  • Connectivity to databases (MySQL, Cassandra, MongoDB, etc.)
  • File system support (HDFS, S3, FTP, NFS)
  • REST API for monitoring and DAG visualization

Benefits

  • High throughput and low latency for big data applications
  • Simplified development with reusable components
  • Resilient to hardware and process failures
  • Seamless scaling across Hadoop clusters
  • Reduced operational complexity via platform-managed concerns
  • Ideal for enterprise-grade data pipelines and analytics
  • Enables rapid prototyping and deployment of data applications