Name: Apache Celeborn
Brand: The Apache Software Foundation

Apache CelebornThe Apache Software Foundation

Celeborn is an intermediate data service for Big Data compute engines like ETL, OLAP, and streaming systems, designed to boost performance, stability, and flexibility by managing shuffle and spilled data efficiently.

Vendor

The Apache Software Foundation

Company Website

https://celeborn.apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache Celeborn

Apache Celeborn is an intermediate data service designed to optimize data exchange in distributed Big Data compute engines such as ETL, OLAP, and streaming systems. It addresses inefficiencies in traditional shuffle frameworks by reorganizing and managing shuffle and spilled data in a more performant and scalable way. Celeborn decouples data storage from compute nodes, enabling disaggregated architectures and improving overall system flexibility and stability.

Features

Efficient shuffle data management across distributed systems
Support for multiple storage layers: memory, local disks, distributed file systems, and object stores
High availability via Raft-based Master node architecture
Integration with Apache Spark, Apache Flink, and Hadoop MapReduce
Modular architecture with Master, Worker, and Client components
Fine-grained control over shuffle lifecycle and metadata
Optimized disk and network usage through data reorganization
Fault-tolerant data handling and recovery mechanisms

Capabilities

Centralized shuffle data service decoupled from compute nodes
Slot-based allocation and reservation for shuffle operations
Logical partitioning of shuffle data for efficient access
Sequential data reading with minimal network connections
Dynamic partition splitting for large or failed data pushes
LifecycleManager and ShuffleClient roles for control and data planes
Compatibility with disaggregated compute-storage architectures
Configurable storage strategies per Worker node

Benefits

Boosts performance of distributed compute engines by reducing shuffle overhead
Enhances system stability and scalability through centralized data management
Reduces local storage requirements on compute nodes
Simplifies integration with existing Big Data frameworks
Improves resource utilization and reduces network congestion
Enables flexible deployment strategies across heterogeneous environments
Facilitates efficient data access patterns for large-scale analytics

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all

Find more products by category

Other Software View all