
NVIDIA nvCOMPNVIDIA
NVIDIA nvCOMP is a high-speed data compression and decompression library optimized for NVIDIA GPUs. Data compression is an essential part of applications for AI training, high-performance computing (HPC), data science, and analytics. As these applications grow in size and complexity, they demand highly optimized and performant compression and decompression capabilities.
Vendor
NVIDIA
Company Website

Product details
NVIDIA nvCOMP is a high-speed data compression and decompression library optimized for NVIDIA GPUs. Data compression is an essential part of applications for AI training, high-performance computing (HPC), data science, and analytics. As these applications grow in size and complexity, they demand highly optimized and performant compression and decompression capabilities.
Features
- Blackwell Optimized Performance: Starting with version 4.2, nvCOMP introduces support for the NVIDIA Blackwell platform. It leverages Blackwell’s dedicated hardware Decompression Engine (DE) to achieve up to 600 GB/s decompression throughput for standard formats. Additionally, Blackwell’s DE minimizes latency and improves efficiency by supporting fused copy-decompress operations and enabling overlap of decompress with compute.
- Compression Format Support: nvCOMP supports a wide range of compression and decompression algorithms, including standard formats such as Snappy, ZSTD, Deflate, and LZ4. Additionally, it provides GPU-optimized formats like Bitcomp, GDeflate, gANS, and Cascaded, which are highly optimized for NVIDIA GPUs and available within a single library.
- Python APIs Support: nvCOMP offers comprehensive Python APIs to provide streamlined interfaces for GPU-accelerated compression and decompression. This enables developers to experience simplified integration and interoperability with frameworks such as PyTorch and TensorFlow.
- nvCOMPDX Device APIs: nvCOMPDx provides device-side API extensions for performing compression and decompression inside your CUDA kernel.
Benefits
- Optimized Performance: Achieves up to 600 GB/s decompression throughput for standard formats, minimizing latency and improving efficiency.
- Versatile Compression Formats: Supports a wide range of standard and GPU-optimized compression formats, providing flexibility for various applications.
- Streamlined Integration: Comprehensive Python APIs enable simplified integration with popular frameworks like PyTorch and TensorFlow.
- Enhanced Efficiency: Device-side API extensions allow for efficient compression and decompression within CUDA kernels, optimizing memory bandwidth and reducing storage overhead.