Logo
Sign in

Apache Gravitino is a geo-distributed, federated metadata lake that unifies metadata management and governance across diverse data and AI assets, supporting multi-engine access and direct integration with various sources.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

gravitino-python-client-introduction-f860054885996b14515df4c8ccb6ef64.png
gravitino-model-arch-16094b35af05823f97bcb8a7fdd9d83c.png
Product details

Apache Gravitino

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake designed to unify metadata management across diverse data and AI assets. It provides direct metadata governance and access across multiple sources, formats, and regions, supporting modern data architectures like lakehouses and federated catalogs. Gravitino enables seamless integration with engines such as Trino, Spark, and Flink, and is built to scale across clouds and geographies.

Features

  • Unified metadata models for tabular and file-based data sources.
  • Direct metadata management with real-time synchronization.
  • Geo-distributed architecture for multi-region deployments.
  • REST API interface with future support for JDBC and Thrift.
  • Support for multiple query engines including Trino, Spark, and Flink.
  • Metadata lineage tracking via OpenLineage integration.
  • Credential vending for secure access to cloud storage (S3, GCS, OSS, ADLS).
  • Fuse for Fileset enabling local disk-like access to remote data.
  • Security enforcement with SQL and path-based authorization.
  • Model catalog for managing machine learning model metadata.

Capabilities

  • Manages metadata across relational databases, file systems, and event streams.
  • Provides centralized governance including access control, auditing, and discovery.
  • Enables federated access to catalogs like Hive, Iceberg, Hudi, and Paimon.
  • Supports metadata caching and lineage tracking for AI workflows.
  • Offers CLI and Python client for metadata operations.
  • Integrates with Apache Ranger for policy enforcement.
  • Facilitates metadata federation across clouds and regions.

Benefits

  • Simplifies metadata governance across heterogeneous systems.
  • Enhances data discoverability and trust with lineage and auditing.
  • Improves security with centralized credential management and policy enforcement.
  • Reduces operational complexity through unified APIs and connectors.
  • Supports AI and ML workflows with model versioning and metadata tracking.
  • Scales with enterprise needs across hybrid and multi-cloud environments.
  • Open-source and backed by the Apache Software Foundation.