Logo
Sign in

Apache Cloudberry is an open-source MPP database built on PostgreSQL 14, designed for data warehousing, large-scale analytics, and AI/ML workloads, offering advanced enterprise capabilities and compatibility with Greenplum.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

apache-email-2-6bcf474c17c6e5826148bf2831247ba0.jpg
primary-palette-a2b42370b5bcea004b8f87d9f673a4d9.png
202411-ASF-Incubator.jpg
Product details

Apache Cloudberry

Apache Cloudberry is a modern, open-source Massively Parallel Processing (MPP) database system currently incubating under the Apache Software Foundation. It is derived from the open-source version of the Pivotal Greenplum Database but built on a newer PostgreSQL 14 kernel, offering enhanced enterprise capabilities. Cloudberry is designed to serve as a high-performance data warehouse and is optimized for large-scale analytics, artificial intelligence (AI), and machine learning (ML) workloads.

Features

  • Built on PostgreSQL 14 for modern database capabilities
  • Derived from Greenplum Database 7 with improved architecture
  • Supports Massively Parallel Processing (MPP) for scalable performance
  • Row-column hybrid storage engine (PAX) for optimized data access
  • SPARQL and SQL query support for flexible data interaction
  • Tools for data loading, backup, and migration
  • Integrated CI/CD pipeline with Ubuntu build/test support
  • Security enhancements including CVE fixes and code scanning
  • Open-source license compliance and audit tools

Capabilities

  • Efficiently handle petabyte-scale data warehousing
  • Run complex analytics and machine learning workloads
  • Migrate seamlessly from Greenplum using gpbackup and other tools
  • Deploy across distributed environments with coordinated segment management
  • Customize and extend via GitHub-based contribution workflows
  • Utilize advanced storage and query optimization techniques
  • Integrate with enterprise-grade monitoring and security tools

Benefits

  • Modern PostgreSQL foundation ensures long-term maintainability
  • Open-source and community-driven development model
  • Enterprise-ready features for analytics, AI, and ML
  • Compatibility with existing Greenplum deployments
  • Scalable architecture for growing data needs
  • Active community and contributor support
  • Transparent roadmap and incubation progress