
Apache SystemDSThe Apache Software Foundation
Apache SystemDS is an open-source machine learning system for end-to-end data science workflows. It supports scalable training, data cleaning, feature engineering, and deployment using high-level scripting and hybrid execution across local and distributed environments, optimized for performance and flexibility.
Vendor
The Apache Software Foundation
Company Website


Product details
Apache SystemDS
Apache SystemDS is an open-source machine learning system designed for the end-to-end data science lifecycle. It supports data integration, cleaning, feature engineering, scalable model training, and deployment. SystemDS provides high-level declarative languages and hybrid execution plans optimized for both local and distributed environments, making it suitable for data scientists, engineers, and researchers across various domains
Features
- Declarative scripting languages with R-like and Python-like syntax.
- Built-in functions for complex machine learning tasks.
- Support for multiple execution modes: Standalone, Spark Batch, Spark MLContext, and JMLC.
- Automatic optimization based on data and cluster characteristics.
- Entity resolution primitives and customizable pipelines.
- Integration with Java, Python, Hadoop, Spark, and CUDA.
- Support for linear algebra, statistics, classification, clustering, and deep learning algorithms.
Capabilities
- Hybrid execution plans combining in-memory and distributed operations.
- Scalable from single-node to large Spark clusters.
- Efficient use of CPU and GPU resources.
- Custom algorithm development using DML scripting.
- Flexible deployment via command-line, APIs, or embedded systems.
- Extensible architecture for integrating new algorithms and data sources.
- Supports structured and unstructured data processing.
Benefits
- Accelerates development of machine learning workflows.
- Reduces complexity with high-level scripting and automation.
- Enhances performance through adaptive optimization.
- Enables reproducible and maintainable data science pipelines.
- Open-source and backed by the Apache Software Foundation.
- Suitable for academic research, enterprise analytics, and production ML systems.
- Active community and comprehensive documentation.
Find more products by industry
Other ServicesEducationFinance & InsuranceHealth & Social WorkPublic AdministrationInformation & CommunicationView all