Name: Apache ORC
Brand: The Apache Software Foundation

Apache ORCThe Apache Software Foundation

Apache ORC is a columnar storage format optimized for big data processing. It provides efficient compression, indexing, and fast data access, making it ideal for analytics workloads in Hadoop-based systems. ORC supports complex types and is designed for high performance and scalability.

Vendor

The Apache Software Foundation

Company Website

https://orc.apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache ORC

Apache ORC (Optimized Row Columnar) is a high-performance columnar storage format designed for big data processing in the Hadoop ecosystem. It provides efficient compression, fast data access, and rich indexing capabilities, making it ideal for large-scale analytics workloads. ORC is self-describing and type-aware, supporting complex data types and optimized for streaming reads.

Features

Columnar storage format optimized for Hadoop
Built-in indexes including min/max values and bloom filters
Support for complex types: structs, lists, maps, and unions
Lightweight metadata for fast schema discovery
Predicate pushdown for efficient query filtering
ACID transaction support and snapshot isolation
Stripe-based file structure for parallel processing
Advanced compression techniques for reduced storage footprint

Capabilities

Seamless integration with Apache Hive, Spark, and other Hadoop tools
Efficient read and write operations for large datasets
Type-aware encoding for optimal performance
Supports schema evolution and backward compatibility
Enables distributed processing with independent file stripes
Compatible with Java APIs for custom data handling
Designed for high-throughput and low-latency analytics
Supports vectorized query execution for faster performance

Benefits

Reduces storage costs through advanced compression
Accelerates query performance with built-in indexing
Improves scalability for big data applications
Enhances data integrity with ACID compliance
Simplifies data management with self-describing files
Open-source and actively maintained under the Apache License
Trusted by major organizations like Facebook and Yahoo for petabyte-scale data

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all

Find more products by category

Other Software View all