Logo
Sign in
Product Logo
NVIDIA Merlin NVTabularNVIDIA

NVIDIA Merlin NVTabular accelerates feature engineering and preprocessing for GPU-accelerated recommender systems.

Vendor

Vendor

NVIDIA

Company Website

Company Website

nvidia-merlin-components-interoperability_1.png
Product details

NVIDIA Merlin NVTabular is a feature engineering and preprocessing library designed to effectively manipulate terabytes of recommender system datasets and significantly reduce data preparation time. It provides efficient feature transformations, preprocessing, and high-level abstraction that accelerates computation on GPUs using the RAPIDS cuDF library. NVTabular's fast feature transforms reduce data prep time and ease deploying recommender models to production. With NVTabular recommender-focused APIs, data scientists and machine learning engineers can quickly process datasets of all sizes, implement more experimentation, and are not bound by CPU or GPU memory. NVTabular includes multi-hot categoricals and vector continuous passing support to ease feature engineering.

Features

  • Fast Feature Transforms: NVTabular's fast feature transforms reduce data prep time and ease deploying recommender models to production. With NVTabular recommender-focused APIs, data scientists and machine learning engineers can quickly process datasets of all sizes, implement more experimentation, and are not bound by CPU or GPU memory. NVTabular includes multi-hot categoricals and vector continuous passing support to ease feature engineering.
  • Interoperability with Open Source: Data scientists and machine learning engineers use a hybrid of methods, tools, libraries, and frameworks, including open source. NVTabular native tabular data support includes comma-separated values (CSV) files, Apache Parquet, Apache Orc, and Apache Avro. NVTabular data loaders are optimized for TensorFlow (TF), PyTorch, and Merlin HugeCTR. All Merlin components, including NVTabular, are interoperable with open source.
  • Accelerated on GPUs: NVTabular provides a high-level abstraction that accelerates computation on GPUs using the RAPIDS cuDF library. NVTabular's support for multi-node scaling and multi-GPU with DASK-CUDA and dask.distributed accelerates distributed parallelism.
  • High Performance: NVTabular's multi-GPU support using RAPIDS cuDF, Dask, and Dask_cuDF enables a high-performance recommender-specific pipeline. NVTabular provides a 95x speedup using multi-GPU on the NVIDIA DGX A100 compared to Spark on a four-node, 96 vCPU core, CPU cluster processing 1.3 TB of data in the Criteo Terabyte dataset. NVTabular also provides a speedup of 5.3x using eight NVIDIA A100 GPUs, from 10 minutes on 1xA100 to 1.9 minutes on 8xA100.

Benefits

  • Efficiency: NVTabular drastically reduces the time required for data preparation, allowing data scientists and machine learning engineers to focus on model development and experimentation.
  • Scalability: NVTabular supports large-scale datasets and multi-GPU environments, making it suitable for handling terabytes of data efficiently.
  • Flexibility: NVTabular is compatible with various open-source tools and frameworks, providing flexibility in integrating with existing workflows.
  • Performance: NVTabular achieves high-speed data processing, enhancing the overall efficiency of recommender systems and providing significant speedups in data preparation and model deployment.