Name: Apache Hive
Brand: The Apache Software Foundation

Apache HiveThe Apache Software Foundation

Apache Hive is a distributed data warehouse system built on Apache Hadoop. It enables reading, writing, and managing large datasets stored in distributed systems using SQL. Hive supports data warehousing tasks like ETL, reporting, and analysis, and integrates with various storage formats and engines.

Vendor

The Apache Software Foundation

Company Website

https://hive.apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache Hive

Apache Hive is a distributed, fault-tolerant data warehouse system built on top of Apache Hadoop. It enables scalable analytics by allowing users to read, write, and manage large datasets stored in distributed systems using SQL. Hive is designed for traditional data warehousing tasks such as ETL, reporting, and data analysis, and integrates with various storage formats and engines.

Features

SQL-based query language with support for SQL:2003, SQL:2011, and SQL:2016 features
Extensibility through user-defined functions (UDFs), aggregates (UDAFs), and table functions (UDTFs)
Support for multiple file formats including CSV, TSV, Parquet, ORC, and custom formats
Execution engines including Apache Tez, MapReduce, and LLAP for low-latency queries
HiveServer2 for multi-client concurrency and authentication
Hive Metastore (HMS) for centralized metadata management
Integration with Apache Iceberg for cloud-native table formats
Built-in support for ACID transactions on ORC tables
Data compaction and replication capabilities

Capabilities

Scalable query processing over petabytes of data
Structured access to semi-structured and unstructured data
Metadata-driven architecture for interoperability with tools like Spark, Impala, and Presto
Interactive SQL queries via LLAP with sub-second response times
Cost-based query optimization using Apache Calcite
Secure access with Kerberos authentication and integration with Apache Ranger and Atlas
REST-style interface for metadata operations via WebHCat
Support for streaming data ingestion and replication for backup and recovery

Benefits

Simplifies big data analytics with familiar SQL syntax
Enables efficient data warehousing on Hadoop-based infrastructure
Reduces latency for interactive queries
Enhances data governance and security
Facilitates integration with modern data lake architectures
Offers flexibility in data format and storage options
Scales dynamically with cluster growth
Open-source and community-driven development

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all

Find more products by category

Other Software View all