Logo
Sign in

Apache HBase is a distributed, scalable, non-relational database built on top of Hadoop and HDFS. It provides real-time read/write access to large datasets and is designed to host massive tables with billions of rows and millions of columns across commodity hardware.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

offheap-overview.png
region_split_process.png
Product details

Apache HBase

Apache HBase is an open-source, distributed, scalable, and versioned non-relational database modeled after Google's Bigtable. It is designed to provide real-time read/write access to large datasets hosted on clusters of commodity hardware. Built on top of Hadoop and HDFS, HBase supports billions of rows and millions of columns, making it ideal for applications requiring high throughput and low latency.

Features

  • Linear and Modular Scalability: Easily scales horizontally by adding more nodes.
  • Strict Consistency: Ensures consistent reads and writes across distributed systems.
  • Automatic Sharding: Tables are automatically split and distributed across RegionServers.
  • Failover Support: Built-in mechanisms for automatic failover between RegionServers.
  • Java API: Provides a straightforward API for client access and integration.
  • MapReduce Integration: Base classes for backing Hadoop MapReduce jobs with HBase tables.
  • Caching and Filtering: Includes block cache and Bloom filters for efficient queries.
  • Predicate Pushdown: Server-side filters optimize query performance.
  • Thrift and REST Interfaces: Supports multiple data encoding formats including XML, Protobuf, and binary.
  • Shell Access: Extensible JRuby-based shell for interactive operations.
  • Metrics Export: Integrates with Hadoop metrics subsystem, Ganglia, and JMX.

Capabilities

  • Real-Time Access: Designed for random, real-time read/write operations on massive datasets.
  • Big Data Storage: Handles extremely large tables with billions of rows and millions of columns.
  • Distributed Architecture: Operates across clusters with high fault tolerance and availability.
  • Flexible Data Model: Schema-less design with column families and dynamic columns.
  • Integration with Hadoop Ecosystem: Seamlessly works with HDFS, MapReduce, and other Hadoop tools.
  • Multi-Versioned Storage: Stores multiple versions of data for historical access and rollback.

Benefits

  • High Performance: Optimized for low-latency access and high throughput.
  • Scalable Infrastructure: Grows with your data needs without major architectural changes.
  • Cost-Effective: Runs on commodity hardware, reducing infrastructure costs.
  • Reliable and Resilient: Built-in failover and recovery mechanisms ensure data availability.
  • Developer Friendly: Rich APIs, shell tools, and integration options simplify development.
  • Enterprise Ready: Proven in production environments for real-time analytics and data warehousing.