Logo
Sign in
Product Logo
NVSHMEMNVIDIA

NVSHMEM™ is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams.

Vendor

Vendor

NVIDIA

Company Website

Company Website

mpi-nvshmem-explainer-diagram.svg
Product details

NVSHMEM™ is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. It creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams.

Features

  • Efficient, Strong Scaling: Enables long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.
  • Low Overhead: One-sided communication primitives reduce overhead by allowing the initiating process or GPU thread to specify all information required to complete a data transfer.
  • Naturally Asynchronous: Asynchronous communications make it easier for programmers to interleave computation and communication, thereby increasing overall application performance.
  • Platform Support: Supports Blackwell SM100 architecture on NVLINK5 connected B200-based systems.
  • Advanced Algorithms: Includes one-shot and two-shot NVLINK SHARP (NVLS) allreduce algorithms for various datatypes on NVLINK4 and NVLINK5 enabled platforms.
  • LLVM IR-Compliant: New LLVM IR-compliant bitcode device library to support MLIR-compliant compiler toolchain integration on new and upcoming Python DSLs.

Benefits

  • Increased Performance: Offloads intensive flow vector computation to dedicated hardware on the GPU, freeing up GPU and CPU cycles for other tasks.
  • Efficiency: Reduces computational complexity and improves real-time video processing capabilities.
  • Versatility: Suitable for a wide range of applications, including video analytics, VR experiences, and video playback enhancement.
  • Scalability: Supports high-performance video processing, including frame rate up-conversion and object tracking.
  • Developer-Friendly: Provides comprehensive support for GPU-accelerated video workflows with rich APIs and reusable code.
Find more products by category
Development SoftwareView all