
NVIDIA DGX Cloud Serverless InferenceNVIDIA
NVIDIA DGX™ Cloud Serverless Inference is a high-performance, serverless AI inference solution that accelerates AI innovation with auto-scaling, cost-efficient GPU utilization, multi-cloud flexibility, and seamless scalability.
Vendor
NVIDIA
Company Website

Product details
NVIDIA DGX™ Cloud Serverless Inference is a high-performance, serverless AI inference solution that accelerates AI innovation with auto-scaling, cost-efficient GPU utilization, multi-cloud flexibility, and seamless scalability. It simplifies AI workload deployment across multiple regions with seamless auto-scaling, load balancing, and event-driven execution. Developers can bring their own models, containers, or Helm charts and instantly integrate with NVIDIA GPUs in DGX Cloud or partner infrastructure.
Features
- Auto-Scaling to Zero: Scale down to zero instances during periods of inactivity to optimize resource utilization and reduce costs.
- BYO Observability: Integrate preferred monitoring tools, such as Splunk, for comprehensive insights into AI workloads.
- Broad Workload Support: Flexible deployment options for NIM microservices, containers, models, and Helm charts hosted within the NGC™ Private Registry.
- Targeted Deployment: Choose instance types with specific characteristics, such as number of GPUs, CPU cores, architecture, storage, and geographical location.
- Flexible Deployment Options: Deploy inference pipelines or data preprocessing workflows in containers optimized for NVIDIA GPUs via API, CLI, or UI.
Benefits
- Optimize Resource Utilization: Auto-scaling to zero instances during inactivity reduces costs and optimizes resource utilization.
- Comprehensive Insights: Integrate with preferred monitoring tools for detailed insights into AI workloads.
- Flexible and Scalable: Broad workload support and targeted deployment options provide flexibility and scalability.
- Simplified AI Deployment: Simplifies AI workload deployment with seamless auto-scaling, load balancing, and event-driven execution.
- Cost Efficiency: Reduces operational costs with optimized GPU utilization and flexible deployment options.