Logo
Sign in

Apache ORC is a columnar storage format optimized for big data processing. It provides efficient compression, indexing, and fast data access, making it ideal for analytics workloads in Hadoop-based systems. ORC supports complex types and is designed for high performance and scalability.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

download.jpg
TreeWriters.png
Direct.png
Product details

Apache ORC

Apache ORC (Optimized Row Columnar) is a high-performance columnar storage format designed for big data processing in the Hadoop ecosystem. It provides efficient compression, fast data access, and rich indexing capabilities, making it ideal for large-scale analytics workloads. ORC is self-describing and type-aware, supporting complex data types and optimized for streaming reads.

Features

  • Columnar storage format optimized for Hadoop
  • Built-in indexes including min/max values and bloom filters
  • Support for complex types: structs, lists, maps, and unions
  • Lightweight metadata for fast schema discovery
  • Predicate pushdown for efficient query filtering
  • ACID transaction support and snapshot isolation
  • Stripe-based file structure for parallel processing
  • Advanced compression techniques for reduced storage footprint

Capabilities

  • Seamless integration with Apache Hive, Spark, and other Hadoop tools
  • Efficient read and write operations for large datasets
  • Type-aware encoding for optimal performance
  • Supports schema evolution and backward compatibility
  • Enables distributed processing with independent file stripes
  • Compatible with Java APIs for custom data handling
  • Designed for high-throughput and low-latency analytics
  • Supports vectorized query execution for faster performance

Benefits

  • Reduces storage costs through advanced compression
  • Accelerates query performance with built-in indexing
  • Improves scalability for big data applications
  • Enhances data integrity with ACID compliance
  • Simplifies data management with self-describing files
  • Open-source and actively maintained under the Apache License
  • Trusted by major organizations like Facebook and Yahoo for petabyte-scale data