
NVSHMEMNVIDIA
NVSHMEM™ is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams.
Vendor
NVIDIA
Company Website
Product details
NVSHMEM™ is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. It creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams.
Features
- Efficient, Strong Scaling: Enables long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.
- Low Overhead: One-sided communication primitives reduce overhead by allowing the initiating process or GPU thread to specify all information required to complete a data transfer.
- Naturally Asynchronous: Asynchronous communications make it easier for programmers to interleave computation and communication, thereby increasing overall application performance.
- Platform Support: Supports Blackwell SM100 architecture on NVLINK5 connected B200-based systems.
- Advanced Algorithms: Includes one-shot and two-shot NVLINK SHARP (NVLS) allreduce algorithms for various datatypes on NVLINK4 and NVLINK5 enabled platforms.
- LLVM IR-Compliant: New LLVM IR-compliant bitcode device library to support MLIR-compliant compiler toolchain integration on new and upcoming Python DSLs.
Benefits
- Increased Performance: Offloads intensive flow vector computation to dedicated hardware on the GPU, freeing up GPU and CPU cycles for other tasks.
- Efficiency: Reduces computational complexity and improves real-time video processing capabilities.
- Versatility: Suitable for a wide range of applications, including video analytics, VR experiences, and video playback enhancement.
- Scalability: Supports high-performance video processing, including frame rate up-conversion and object tracking.
- Developer-Friendly: Provides comprehensive support for GPU-accelerated video workflows with rich APIs and reusable code.