Name: Apache Paimon
Brand: The Apache Software Foundation

Apache PaimonThe Apache Software Foundation

Apache Paimon is a lakehouse storage format that supports real-time and batch processing with engines like Flink and Spark. It combines a lake format with LSM structure to enable real-time streaming updates, flexible data management, and efficient querying for large-scale data architectures.

Vendor

The Apache Software Foundation

Company Website

https://paimon.apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache Paimon

Apache Paimon is an open-source lakehouse storage format designed to support real-time and batch data processing. It combines the benefits of data lakes and data warehouses by enabling streaming updates, efficient querying, and scalable metadata management. Built to integrate with engines like Apache Flink and Apache Spark, Paimon supports both append-only and primary-key tables, making it suitable for a wide range of analytical and transactional workloads

Features

Real-time streaming updates with primary-key support
Flexible update mechanisms via merge engines
Changelog tracking for accurate stream analytics
Append-only tables for large-scale batch and streaming processing
Data skipping using min-max indexes for fast queries
Full schema evolution and time travel capabilities
Compaction with z-order sorting for optimized storage
Integration with Flink, Spark, Hive, Trino, and other engines
Unified table abstraction for batch and streaming modes

Capabilities

Supports hybrid read modes: batch snapshots, streaming offsets, and incremental snapshots
Enables CDC (Change Data Capture) ingestion from databases
Provides high-performance OLAP queries over large datasets
Stores columnar files with manifest-based metadata for efficient access
Uses LSM tree structure for scalable updates and queries
Compatible with object stores and distributed file systems
Facilitates real-time analytics with sub-minute query latency
Offers flexible partitioning and indexing strategies

Benefits

Combines the flexibility of data lakes with the performance of data warehouses
Reduces latency for real-time data ingestion and querying
Simplifies data architecture with unified storage and access patterns
Enhances scalability and reliability for enterprise-grade workloads
Supports modern data engineering practices including stream processing and schema evolution
Enables cost-effective storage with efficient compaction and indexing
Promotes open-source collaboration and extensibility

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all

Find more products by category

Other Software View all