
DCGM ExporterNVIDIA
DCGM Exporter monitors NVIDIA GPUs in Kubernetes using Prometheus for health and metrics collection.
Vendor
NVIDIA
Company Website
Product details
DCGM Exporter is an exporter for Prometheus to monitor the health and get metrics from NVIDIA GPUs in Kubernetes clusters. It leverages DCGM using Go bindings to collect GPU telemetry and exposes GPU metrics to Prometheus using an HTTP endpoint (/metrics). DCGM Exporter can be used either standalone or deployed as part of the NVIDIA GPU Operator.
Features
- Prometheus Integration: Integrates with Prometheus for monitoring and metrics collection.
- GPU Telemetry Collection: Uses DCGM to collect detailed GPU telemetry.
- HTTP Endpoint: Exposes GPU metrics via an HTTP endpoint (/metrics).
- Standalone or Integrated: Can be used standalone or with NVIDIA GPU Operator.
- Go Bindings: Utilizes Go bindings for efficient telemetry collection.
Benefits
- Enhanced Monitoring: Provides comprehensive monitoring of NVIDIA GPUs in Kubernetes.
- Detailed Metrics: Collects detailed GPU metrics for performance analysis.
- Flexible Deployment: Offers flexibility in deployment, either standalone or integrated.
- Efficient Telemetry: Ensures efficient telemetry collection with Go bindings.
- Improved Health Monitoring: Enhances health monitoring of GPU resources.