
NVIDIA TensorRTNVIDIA
NVIDIA® TensorRT™ is an ecosystem of tools for developers to achieve high-performance deep learning inference. TensorRT includes inference compilers, runtimes, and model optimizations that deliver low latency and high throughput for production applications.
Vendor
NVIDIA
Company Website

Product details
NVIDIA® TensorRT™ is an ecosystem of tools for developers to achieve high-performance deep learning inference. TensorRT includes inference compilers, runtimes, and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes the TensorRT compiler, TensorRT-LLM, TensorRT Model Optimizer, TensorRT for RTX, and TensorRT Cloud.
Features
- High Performance: Speed up inference by 36X compared to CPU-only platforms using NVIDIA CUDA parallel programming model.
- Model Optimization: Includes quantization, layer and tensor fusion, and kernel tuning techniques. Supports FP8, FP4, INT8, INT4, and advanced techniques such as AWQ.
- Large Language Model Inference: TensorRT-LLM accelerates and optimizes inference performance of large language models with a simplified Python API.
- Cloud Compilation: TensorRT Cloud generates hyper-optimized engines for given constraints and KPIs, automatically determining the best engine configuration.
- Framework Integrations: Direct integration with PyTorch and Hugging Face for faster inference. ONNX parser for importing models from popular frameworks.
- Deployment and Scaling: TensorRT-optimized models are deployed, run, and scaled with NVIDIA Dynamo Triton inference-serving software.
Benefits
- Optimized Performance: Delivers low latency and high throughput for production applications.
- Efficiency: Reduces memory bandwidth and latency, essential for real-time services and embedded applications.
- Versatility: Suitable for a wide range of applications, including intelligent video analytics, speech AI, recommender systems, and AI-based cybersecurity.
- Scalability: Supports deployment across edge, laptops, desktops, and data centers.
- Developer-Friendly: Provides a unified path to deploy AI models with rich APIs and reusable code.