Logo
/
Sign in
Product Logo
NVIDIA nvCOMPNVIDIA

NVIDIA nvCOMP is a high-speed data compression and decompression library optimized for NVIDIA GPUs. Data compression is an essential part of applications for AI training, high-performance computing (HPC), data science, and analytics. As these applications grow in size and complexity, they demand highly optimized and performant compression and decompression capabilities.

Vendor

Vendor

NVIDIA

cuda-chart-nvcomp-benchmark-devzone-1920x1080.png
Product details

NVIDIA nvCOMP is a high-speed data compression and decompression library optimized for NVIDIA GPUs. Data compression is an essential part of applications for AI training, high-performance computing (HPC), data science, and analytics. As these applications grow in size and complexity, they demand highly optimized and performant compression and decompression capabilities.

Features

  • Blackwell Optimized Performance: Starting with version 4.2, nvCOMP introduces support for the NVIDIA Blackwell platform. It leverages Blackwell’s dedicated hardware Decompression Engine (DE) to achieve up to 600 GB/s decompression throughput for standard formats. Additionally, Blackwell’s DE minimizes latency and improves efficiency by supporting fused copy-decompress operations and enabling overlap of decompress with compute.
  • Compression Format Support: nvCOMP supports a wide range of compression and decompression algorithms, including standard formats such as Snappy, ZSTD, Deflate, and LZ4. Additionally, it provides GPU-optimized formats like Bitcomp, GDeflate, gANS, and Cascaded, which are highly optimized for NVIDIA GPUs and available within a single library.
  • Python APIs Support: nvCOMP offers comprehensive Python APIs to provide streamlined interfaces for GPU-accelerated compression and decompression. This enables developers to experience simplified integration and interoperability with frameworks such as PyTorch and TensorFlow.
  • nvCOMPDX Device APIs: nvCOMPDx provides device-side API extensions for performing compression and decompression inside your CUDA kernel.

Benefits

  • Optimized Performance: Achieves up to 600 GB/s decompression throughput for standard formats, minimizing latency and improving efficiency.
  • Versatile Compression Formats: Supports a wide range of standard and GPU-optimized compression formats, providing flexibility for various applications.
  • Streamlined Integration: Comprehensive Python APIs enable simplified integration with popular frameworks like PyTorch and TensorFlow.
  • Enhanced Efficiency: Device-side API extensions allow for efficient compression and decompression within CUDA kernels, optimizing memory bandwidth and reducing storage overhead.