Name: Apache Iceberg
Brand: The Apache Software Foundation

Apache IcebergThe Apache Software Foundation

Apache Iceberg is a high-performance open table format for large analytic datasets. It enables reliable SQL-like operations on big data and supports multiple engines like Spark, Flink, Trino, and Hive, allowing concurrent access and advanced features such as schema evolution, hidden partitioning, and time travel.

Vendor

The Apache Software Foundation

Company Website

https://apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache Iceberg

Apache Iceberg is an open-source, high-performance table format designed for managing large analytic datasets. It brings the reliability and simplicity of SQL tables to big data environments, enabling multiple compute engines like Spark, Flink, Trino, Hive, and Impala to safely access and modify the same tables concurrently. Iceberg supports advanced features such as schema evolution, hidden partitioning, time travel, and rollback, making it ideal for modern data lake architectures.

Features

Full schema evolution including column add, drop, rename, and reorder
Hidden partitioning for automatic and optimized query performance
Time travel and rollback for reproducible queries and error recovery
Row-level deletes and updates using position and equality delete files
Advanced filtering with column-level and partition-level statistics
Optimistic concurrency for safe multi-writer environments
Serializable isolation ensuring atomic and consistent table changes
Support for branching and tagging of table versions
REST catalog and multiple language APIs for integration flexibility

Capabilities

Manages petabyte-scale tables with efficient metadata tracking
Enables fast scan planning without requiring distributed SQL engines
Supports multiple file formats including Parquet, Avro, and ORC
Integrates with cloud object stores and HDFS without relying on directory listings
Allows dynamic partition layout evolution based on query patterns
Provides snapshot-based access to table states for consistency and auditability
Facilitates cost-based optimization through rich metadata
Compatible with various compute engines and deployment environments
Offers extensible specification for cross-language and cross-platform support

Benefits

Simplifies big data table management with SQL-like semantics
Reduces query latency and improves performance through metadata pruning
Enhances data reliability and correctness in distributed environments
Supports agile data modeling with safe and flexible schema changes
Enables reproducible analytics and debugging with time travel
Minimizes operational complexity with built-in compaction and isolation
Promotes open standards and avoids vendor lock-in
Scales efficiently with growing data volumes and concurrent workloads

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all