Logo
Sign in
Product Logo
NVIDIA System Management (NVSM)NVIDIA

A software framework for monitoring NVIDIA DGX nodes, providing health monitoring, system alerts, and log generation.

Vendor

Vendor

NVIDIA

Company Website

Company Website

Product details

NVIDIA System Management (NVSM) is a software framework designed for monitoring NVIDIA DGX nodes in a data center. It includes active health monitoring, system alerts, and log generation for DGX Servers. For DGX Station, it is limited to using the CLI to check the health of the system and obtain diagnostic information. The v1.0.0-21.07.x release is the first release of the NVSM containers, supporting three main operations: show health, dump health, and show versions.

Features

  • Active Health Monitoring: Continuously monitors the health of NVIDIA DGX nodes.
  • System Alerts: Provides alerts for any system issues detected.
  • Log Generation: Generates logs for system diagnostics and troubleshooting.
  • CLI Commands: Includes commands for showing health, dumping health, and showing versions.
  • Health Reports: Produces health report files suitable for support tickets.
  • Version Information: Displays versions of installed packages and firmware.

Benefits

  • Enhanced Monitoring: Ensures continuous monitoring of system health for NVIDIA DGX nodes.
  • Proactive Alerts: Alerts administrators to potential issues before they become critical.
  • Comprehensive Diagnostics: Facilitates troubleshooting with detailed logs and health reports.
  • Ease of Use: Simple CLI commands for quick access to system health and version information.
Find more products by category
Security SoftwareView all