Logo
/
Sign in

Apache Drill is a schema-free SQL query engine for big data exploration. It enables high-performance analysis on semi-structured data without requiring predefined schemas, supporting standard SQL and integration with BI tools and various NoSQL and cloud storage systems.

Vendor

Vendor

The Apache Software Foundation

arc-1.jpg
query-flow-client.png
web-ui-login.png
Product details

Apache Drill

Apache Drill is a schema-free, distributed SQL query engine designed for interactive analysis of large-scale datasets, including structured, semi-structured, and nested data. Inspired by Google’s Dremel, Drill enables high-performance querying without requiring centralized metadata or schema definitions. It supports dynamic schema discovery and integrates seamlessly with Hadoop, NoSQL databases, and cloud storage systems.

Features

  • Schema-free SQL querying on self-describing data formats like JSON, Parquet, and AVRO
  • ANSI SQL support with extensions for nested and complex data
  • Integration with Apache Hive, HBase, and various NoSQL and cloud storage systems
  • JDBC and ODBC drivers for BI tool compatibility (e.g., Tableau, Excel, Qlik)
  • In-memory columnar execution engine with support for complex data
  • Dynamic query compilation and re-compilation for performance optimization
  • REST API for custom application integration
  • Drill Web UI and shell for interactive query execution
  • Storage plugin architecture for extensibility and custom data source support
  • Support for advanced SQL features like joins, nested queries, and metadata introspection

Capabilities

  • Query data in-situ without loading, transforming, or defining schemas
  • Join data across multiple heterogeneous sources in a single query
  • Scale from a single laptop to thousands of nodes in a distributed cluster
  • Perform ad-hoc queries on petabyte-scale datasets with low latency
  • Discover and adapt to changing schemas during query execution
  • Optimize query plans using datastore-aware execution and data locality
  • Access nested attributes as SQL columns with intuitive syntax
  • Operate in any distributed environment with ZooKeeper coordination
  • Extend functionality through plugins for storage, query execution, and client APIs

Benefits

  • Eliminates overhead of schema management and data preparation
  • Enables rapid data exploration and agile analytics workflows
  • Reduces dependency on IT and database administrators
  • Enhances flexibility for modern applications with evolving data structures
  • Improves performance through columnar execution and memory optimization
  • Supports familiar SQL and BI tools for seamless user experience
  • Facilitates integration with diverse data ecosystems
  • Offers extensibility for custom enterprise use cases
  • Provides decentralized metadata management for multi-source querying
  • Backed by a robust open-source community and Apache governance