Name: Apache Tez
Brand: The Apache Software Foundation

Apache TezThe Apache Software Foundation

Apache Tez is a data processing framework built on Hadoop YARN that executes complex directed-acyclic-graphs of tasks. It replaces multiple MapReduce jobs with a single Tez job, improving performance, resource efficiency, and flexibility for big data applications like Hive and Pig.

Vendor

The Apache Software Foundation

Company Website

https://tez.apache.org

YouTube

https://www.youtube.com/c/TheApacheFoundation

Product details

Apache Tez

Apache Tez is an advanced data processing framework built on top of Apache Hadoop YARN. It enables the execution of complex directed-acyclic-graphs (DAGs) of tasks, offering a powerful alternative to traditional MapReduce for big data applications. Tez is designed to optimize performance, resource usage, and flexibility for data-intensive applications like Apache Hive and Apache Pig.

Features

Expressive DAG-based Execution Model: Allows users to define complex data processing workflows using a flexible DAG structure.
Flexible Input-Processor-Output Runtime Model: Supports custom implementations of inputs, outputs, and processors for tailored data handling.
Data Type Agnostic: Works with various data formats without requiring specific adaptations.
Tez UI: A dedicated web-based interface for monitoring live and historical Tez applications, integrated with YARN’s Application Timeline Server.
Tez Shuffle Handler: Optimized data shuffling for DAGs, supporting features like auto-reduce parallelism and intermediate data cleanup.
Dynamic Plan Reconfiguration: Enables runtime decisions for physical data flow and resource allocation.
Tez Sessions: Supports long-running sessions to execute multiple DAGs efficiently.

Capabilities

High Performance: Significant speed improvements over MapReduce by reducing job overhead and enabling in-memory data transfers.
Resource Optimization: Efficient use of cluster resources through dynamic task scheduling and reuse.
Scalability: Designed to scale with large datasets and complex workflows across distributed environments.
Integration with Hadoop Ecosystem: Seamlessly integrates with Apache Hadoop, Hive, Pig, and other tools.
Customizability: Users can embed application-specific data into the Tez UI and configure runtime behavior via tez-site.xml.
Deployment Flexibility: Can be deployed using full or minimal tarballs, with or without Hadoop dependencies, supporting various cluster configurations.

Benefits

Reduced Latency: Faster job execution by minimizing disk I/O and leveraging in-memory operations.
Simplified Workflow Management: Combines multiple MapReduce jobs into a single Tez DAG, reducing complexity.
Improved Debugging and Monitoring: Tez UI provides detailed insights into task execution and performance metrics.
Cost Efficiency: Better resource utilization leads to lower operational costs in large-scale data environments.
Future-Proof Architecture: Modular design allows for easy extension and adaptation to evolving data processing needs.

Find more products by segment

Large Business Enterprise Medium Business Small Business B2B View all

Find more products by industry

Other Services Education Finance & Insurance Health & Social Work Public Administration Information & Communication View all

Find more products by category

Other Software View all