Logo
Sign in
Product Logo
Triton Inference ServerNVIDIA

Triton Inference Server deploys AI models from any framework on any GPU or CPU infrastructure, supporting cloud and edge inferencing.

Vendor

Vendor

NVIDIA

Company Website

Company Website

Product details

Triton Inference Server is an open-source software that lets teams deploy trained AI models from any framework, from local or cloud storage, and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices. Triton supports HTTP/REST and GRPC protocols, allowing remote clients to request inferencing for any model managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application.

Features

  • Framework Support: Supports PyTorch, TensorRT, ONNX, OpenVINO, and more.
  • Protocol Support: Provides HTTP/REST and GRPC protocols for remote inferencing.
  • Edge Deployment: Available as a shared library with a C API for edge applications.
  • Docker Images: Multiple Docker images available for different use cases, including support for Jetson Orin devices.
  • Client Libraries: Includes Python and C++ client libraries, client examples, and performance analysis tools.

Benefits

  • Versatile Deployment: Deploy AI models on any GPU or CPU infrastructure.
  • Scalable Solution: Scales from cloud to edge deployments.
  • High Performance: Optimized for both CPUs and GPUs, ensuring high performance.
  • Ease of Use: Simplifies the deployment and management of AI models with comprehensive tools and libraries.
  • Enterprise Support: Supported by NVIDIA AI Enterprise for robust, enterprise-grade solutions.