ModinIntel Corporation
This library enables distributed DataFrame processing and is fully compatible with the pandas API.
Vendor
Intel Corporation
Company Website
Product details
Accelerate pandas DataFrame Processing
Modin* is a drop-in replacement for pandas, enabling data scientists to scale to distributed DataFrame processing without having to change API code. Beginning with the 2024.2 release of AI Tools, Intel upstreams all optimizations to open source Modin. Using this library, you can:
- Process terabytes of data on a single workstation
- Scale from a single workstation to the cloud using the same code
- Focus more on data analysis and less on learning new APIs
Features
- Accelerated DataFrame Processing - Speed up the extract, transform, and load (ETL) process for large DataFrames. - Automatically use all of the processing cores available on your machine.
- Optimized for Intel Hardware - Scale to terabytes of data on a single data science workstation. - Analyze large datasets (over one billion rows) using performant end-to-end analytics frameworks that take advantage of the compute power for current and future Intel hardware.
- **Compatible with Existing APIs and Engines: ** - Change one line of code to use your existing pandas API calls, no matter the scale. Instead of import pandas as pd use import modin.pandas as pd - Use Ray, Dask, or Message Passing Interface (MPI) compute engines to distribute the data without having to write code. - Continue to use the rest of your Python ecosystem code, such as NumPy, XGBoost, and scikit-learn. - Use the same notebook to scale from your local machine to the cloud.