Logo
Sign in

Apache Accumulo is a distributed key/value store that provides scalable and secure data storage. It supports fine-grained access control, efficient data compression, and server-side processing, making it suitable for large-scale applications requiring high performance and flexible data management.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

compaction-spi-design.png
data_distribution.png
tablet_server.png
Product details

Apache Accumulo

Apache Accumulo is a highly scalable, sorted, distributed key/value store built on top of Apache Hadoop’s HDFS. It is designed for robust and secure data storage and retrieval across large clusters. Inspired by Google’s BigTable, Accumulo supports fine-grained access control, efficient data compression, and server-side programming, making it ideal for applications requiring high performance, scalability, and security.

Features

  • Sorted Key/Value Storage
  • Data is stored as sorted key/value pairs, enabling fast range queries and efficient scans.
  • Cell-Level Security
  • Each key/value pair includes a visibility label for fine-grained access control based on user authorizations.
  • Server-Side Programming with Iterators
  • Custom logic can be applied during data scans, compactions, and other operations using iterators.
  • Automatic Load Balancing and Partitioning
  • Data is automatically distributed across tablet servers for optimal performance and scalability.
  • Compression and Versioning
  • Supports data compression and versioning through timestamped keys.
  • Stable Client API
  • Follows semantic versioning and long-term maintenance releases for reliable integration.

Capabilities

  • Distributed Architecture
  • Operates across clusters using HDFS and ZooKeeper for coordination and fault tolerance.
  • Tablet Server Management
  • Handles data ingestion, sorting, flushing to disk, and recovery from failures.
  • MapReduce Integration
  • Accumulo tables can be used as input/output for Hadoop MapReduce jobs.
  • Garbage Collection
  • Identifies and removes unused files to maintain storage efficiency.
  • Master Server Coordination
  • Oversees tablet server health and orchestrates recovery and load balancing.
  • Mini Accumulo Cluster
  • Enables local testing and development with a lightweight cluster setup.

Benefits

  • Scalability
  • Easily scales with data growth by adding nodes to the cluster.
  • Security
  • Fine-grained access control ensures data confidentiality and compliance.
  • Performance
  • Optimized for fast reads and writes with efficient memory and disk usage.
  • Flexibility
  • Supports custom data processing and integration with other big data tools.
  • Reliability
  • Built-in fault tolerance and recovery mechanisms ensure high availability.