Vendor

NVIDIA

Company Website

nvidia.com

Product details

NVIDIA DGX™ Cloud Serverless Inference is a high-performance, serverless AI inference solution that accelerates AI innovation with auto-scaling, cost-efficient GPU utilization, multi-cloud flexibility, and seamless scalability. It simplifies AI workload deployment across multiple regions with seamless auto-scaling, load balancing, and event-driven execution. Developers can bring their own models, containers, or Helm charts and instantly integrate with NVIDIA GPUs in DGX Cloud or partner infrastructure.

Features

Auto-Scaling to Zero: Scale down to zero instances during periods of inactivity to optimize resource utilization and reduce costs.
BYO Observability: Integrate preferred monitoring tools, such as Splunk, for comprehensive insights into AI workloads.
Broad Workload Support: Flexible deployment options for NIM microservices, containers, models, and Helm charts hosted within the NGC™ Private Registry.
Targeted Deployment: Choose instance types with specific characteristics, such as number of GPUs, CPU cores, architecture, storage, and geographical location.
Flexible Deployment Options: Deploy inference pipelines or data preprocessing workflows in containers optimized for NVIDIA GPUs via API, CLI, or UI.

Benefits

Optimize Resource Utilization: Auto-scaling to zero instances during inactivity reduces costs and optimizes resource utilization.
Comprehensive Insights: Integrate with preferred monitoring tools for detailed insights into AI workloads.
Flexible and Scalable: Broad workload support and targeted deployment options provide flexibility and scalability.
Simplified AI Deployment: Simplifies AI workload deployment with seamless auto-scaling, load balancing, and event-driven execution.
Cost Efficiency: Reduces operational costs with optimized GPU utilization and flexible deployment options.

Find more products by segment

Large Business Enterprise B2B View all

Find more products by category

Development Software Analytics Software View all