
NVIDIA NetQNVIDIA
NVIDIA NetQ™ is a highly scalable, modern network operations toolset that provides visibility, troubleshooting, and validation of your Cumulus fabrics in real time. NetQ utilizes telemetry and delivers actionable insights about the health of your data center network, ensuring your AI network fabric is operating smoothly.
Vendor
NVIDIA
Company Website
ethernet-s…tq-3546829.pdf
Product details
NVIDIA NetQ is a highly scalable, modern network operations toolset that provides visibility, troubleshooting, and validation of your Cumulus fabrics in real time. NetQ utilizes telemetry and delivers actionable insights about the health of your data center network, ensuring your AI network fabric is operating smoothly.
Features
- Network Management: Access powerful tools to manage your NVIDIA Cumulus environments at the push of a button.
- Advanced Telemetry: Collect real-time data that enables deep troubleshooting, visibility, and automated workflows from a single GUI.
- Snapshot and Compare: Easily compare prior network configurations to configurations after network changes are made to eliminate risk of disruption.
- Network-Wide Visibility: See real-time visualizations about the health of your network with NetQ’s rich GUI.
- Flow Telemetry: Analyze fabric-wide latency and buffer occupancy data of all the paths of a 4-tuple or 5-tuple flow to identify congestion points.
- Preventive Validation: Reduce manual errors before they’re rolled into production.
- Diagnostic Troubleshooting: Diagnose the root cause of state deviations with advanced diagnostic tools.
- gNMI Collection: Use the gRPC Network Management Interface (gNMI) specification to stream WJH telemetry data from the NetQ agent.
- RoCE Support: Monitor your remote direct-memory access (RDMA) over Converged Ethernet (RoCE) environment with NetQ to gain actionable insights into your AI network fabric.
Benefits
- Streamline Upgrades: Experience push-button simplicity for network management with NetQ's intuitive GUI.
- Gain Real-Time Intelligence: Correlate configuration and operational status, and instantly identify and track state changes for your entire data center.
- Reduce Downtime: Optimize AI operations with quick alerts, faster troubleshooting, and proactive detection.
- Remediate Faster: Detect faulty network states and get alerts with precise fault location data.
- Remove Complexity: Simplify operations and increase operator efficiency by quickly highlighting issues through visualizations and alerts.
- Diagnose Root Causes: Trace network paths, replay the network state at any time in the past, review fabric-wide event change logs, and diagnose the root cause of state deviation.