Logo
Sign in
Product Logo
NVIDIA TensorRTNVIDIA

NVIDIA® TensorRT™ is an ecosystem of tools for developers to achieve high-performance deep learning inference. TensorRT includes inference compilers, runtimes, and model optimizations that deliver low latency and high throughput for production applications. 

Vendor

Vendor

NVIDIA

Company Website

Company Website

how-tensor-rt-works.jpg
Product details

NVIDIA® TensorRT™ is an ecosystem of tools for developers to achieve high-performance deep learning inference. TensorRT includes inference compilers, runtimes, and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes the TensorRT compiler, TensorRT-LLM, TensorRT Model Optimizer, TensorRT for RTX, and TensorRT Cloud.

Features

  • High Performance: Speed up inference by 36X compared to CPU-only platforms using NVIDIA CUDA parallel programming model.
  • Model Optimization: Includes quantization, layer and tensor fusion, and kernel tuning techniques. Supports FP8, FP4, INT8, INT4, and advanced techniques such as AWQ.
  • Large Language Model Inference: TensorRT-LLM accelerates and optimizes inference performance of large language models with a simplified Python API.
  • Cloud Compilation: TensorRT Cloud generates hyper-optimized engines for given constraints and KPIs, automatically determining the best engine configuration.
  • Framework Integrations: Direct integration with PyTorch and Hugging Face for faster inference. ONNX parser for importing models from popular frameworks.
  • Deployment and Scaling: TensorRT-optimized models are deployed, run, and scaled with NVIDIA Dynamo Triton inference-serving software.

Benefits

  • Optimized Performance: Delivers low latency and high throughput for production applications.
  • Efficiency: Reduces memory bandwidth and latency, essential for real-time services and embedded applications.
  • Versatility: Suitable for a wide range of applications, including intelligent video analytics, speech AI, recommender systems, and AI-based cybersecurity.
  • Scalability: Supports deployment across edge, laptops, desktops, and data centers.
  • Developer-Friendly: Provides a unified path to deploy AI models with rich APIs and reusable code.