Logo
Sign in

Apache Uniffle is a remote shuffle service designed to optimize data shuffle operations in distributed computing frameworks like Apache Spark and Hadoop. It improves performance, scalability, and fault tolerance while supporting cloud-native deployments and multiple storage backends.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

rss_shuffle_write-0018bea428f435c37c6f369defc3a081.png
get_results-71ce542dca970bff80b6d444dbf4081c.png
architecture-19961714e3381cf27bf55e28bd212b00.png
Product details

Apache Uniffle

Apache Uniffle is a high-performance, general-purpose remote shuffle service designed for distributed computing engines. It optimizes data shuffle operations in frameworks like Apache Spark and Hadoop MapReduce by reducing I/O overhead, improving reliability, and enabling elastic resource orchestration. Uniffle enhances performance and stability in large-scale data processing environments and supports cloud-native deployments.

Features

  • Remote shuffle service architecture with coordinator and shuffle server clusters
  • Supports multiple storage modes: memory, local disk, and remote storage (e.g., HDFS)
  • Compatible with Apache Spark (2.3.x to 3.3.x) and Hadoop MapReduce
  • Pluggable shuffle client for Spark and MapReduce
  • Efficient data caching and flushing mechanisms
  • Shuffle file format with index and data files for optimized access
  • Kubernetes Operator for deployment and management
  • Dynamic configuration and client coordination
  • Fault-tolerant shuffle data handling
  • Built-in support for speculation in Spark

Capabilities

  • Enables remote shuffle for distributed computing frameworks
  • Reduces random I/O and connection overhead during shuffle operations
  • Improves job reliability by minimizing memory and disk failures
  • Supports dynamic allocation and speculative execution in Spark
  • Integrates with HDFS and other remote storage systems
  • Provides scalable shuffle infrastructure for large workloads
  • Facilitates deployment in Kubernetes environments
  • Offers flexible configuration for production-grade setups

Benefits

  • Enhances performance of data-intensive applications
  • Reduces resource consumption and improves system stability
  • Simplifies shuffle management across distributed systems
  • Supports elastic scaling and orchestration in cloud-native environments
  • Improves fault tolerance and job success rates
  • Enables consistent shuffle behavior across multiple frameworks
  • Promotes modular and maintainable architecture