
Apache TezThe Apache Software Foundation
Apache Tez is a data processing framework built on Hadoop YARN that executes complex directed-acyclic-graphs of tasks. It replaces multiple MapReduce jobs with a single Tez job, improving performance, resource efficiency, and flexibility for big data applications like Hive and Pig.
Vendor
The Apache Software Foundation
Company Website


Product details
Apache Tez
Apache Tez is an advanced data processing framework built on top of Apache Hadoop YARN. It enables the execution of complex directed-acyclic-graphs (DAGs) of tasks, offering a powerful alternative to traditional MapReduce for big data applications. Tez is designed to optimize performance, resource usage, and flexibility for data-intensive applications like Apache Hive and Apache Pig.
Features
- Expressive DAG-based Execution Model: Allows users to define complex data processing workflows using a flexible DAG structure.
- Flexible Input-Processor-Output Runtime Model: Supports custom implementations of inputs, outputs, and processors for tailored data handling.
- Data Type Agnostic: Works with various data formats without requiring specific adaptations.
- Tez UI: A dedicated web-based interface for monitoring live and historical Tez applications, integrated with YARN’s Application Timeline Server.
- Tez Shuffle Handler: Optimized data shuffling for DAGs, supporting features like auto-reduce parallelism and intermediate data cleanup.
- Dynamic Plan Reconfiguration: Enables runtime decisions for physical data flow and resource allocation.
- Tez Sessions: Supports long-running sessions to execute multiple DAGs efficiently.
Capabilities
- High Performance: Significant speed improvements over MapReduce by reducing job overhead and enabling in-memory data transfers.
- Resource Optimization: Efficient use of cluster resources through dynamic task scheduling and reuse.
- Scalability: Designed to scale with large datasets and complex workflows across distributed environments.
- Integration with Hadoop Ecosystem: Seamlessly integrates with Apache Hadoop, Hive, Pig, and other tools.
- Customizability: Users can embed application-specific data into the Tez UI and configure runtime behavior via
tez-site.xml. - Deployment Flexibility: Can be deployed using full or minimal tarballs, with or without Hadoop dependencies, supporting various cluster configurations.
Benefits
- Reduced Latency: Faster job execution by minimizing disk I/O and leveraging in-memory operations.
- Simplified Workflow Management: Combines multiple MapReduce jobs into a single Tez DAG, reducing complexity.
- Improved Debugging and Monitoring: Tez UI provides detailed insights into task execution and performance metrics.
- Cost Efficiency: Better resource utilization leads to lower operational costs in large-scale data environments.
- Future-Proof Architecture: Modular design allows for easy extension and adaptation to evolving data processing needs.
Find more products by industry
Other ServicesEducationFinance & InsuranceHealth & Social WorkPublic AdministrationInformation & CommunicationView all