Logo
Sign in

Apache Drill is a schema-free SQL query engine for big data exploration. It enables high-performance analysis on semi-structured data without requiring predefined schemas, supporting standard SQL and integration with BI tools and various NoSQL and cloud storage systems.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

arc-1.jpg
query-flow-client.png
web-ui-login.png
Product details

Apache Drill

Apache Drill is a schema-free, distributed SQL query engine designed for interactive analysis of large-scale datasets, including structured, semi-structured, and nested data. Inspired by Google’s Dremel, Drill enables high-performance querying without requiring centralized metadata or schema definitions. It supports dynamic schema discovery and integrates seamlessly with Hadoop, NoSQL databases, and cloud storage systems.

Features

  • Schema-free SQL querying on self-describing data formats like JSON, Parquet, and AVRO
  • ANSI SQL support with extensions for nested and complex data
  • Integration with Apache Hive, HBase, and various NoSQL and cloud storage systems
  • JDBC and ODBC drivers for BI tool compatibility (e.g., Tableau, Excel, Qlik)
  • In-memory columnar execution engine with support for complex data
  • Dynamic query compilation and re-compilation for performance optimization
  • REST API for custom application integration
  • Drill Web UI and shell for interactive query execution
  • Storage plugin architecture for extensibility and custom data source support
  • Support for advanced SQL features like joins, nested queries, and metadata introspection

Capabilities

  • Query data in-situ without loading, transforming, or defining schemas
  • Join data across multiple heterogeneous sources in a single query
  • Scale from a single laptop to thousands of nodes in a distributed cluster
  • Perform ad-hoc queries on petabyte-scale datasets with low latency
  • Discover and adapt to changing schemas during query execution
  • Optimize query plans using datastore-aware execution and data locality
  • Access nested attributes as SQL columns with intuitive syntax
  • Operate in any distributed environment with ZooKeeper coordination
  • Extend functionality through plugins for storage, query execution, and client APIs

Benefits

  • Eliminates overhead of schema management and data preparation
  • Enables rapid data exploration and agile analytics workflows
  • Reduces dependency on IT and database administrators
  • Enhances flexibility for modern applications with evolving data structures
  • Improves performance through columnar execution and memory optimization
  • Supports familiar SQL and BI tools for seamless user experience
  • Facilitates integration with diverse data ecosystems
  • Offers extensibility for custom enterprise use cases
  • Provides decentralized metadata management for multi-source querying
  • Backed by a robust open-source community and Apache governance