Logo
Sign in

Apache SystemDS is an open-source machine learning system for end-to-end data science workflows. It supports scalable training, data cleaning, feature engineering, and deployment using high-level scripting and hybrid execution across local and distributed environments, optimized for performance and flexibility.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

diagramAnim-v4.gif
robotTutorialSm.png
Product details

Apache SystemDS

Apache SystemDS is an open-source machine learning system designed for the end-to-end data science lifecycle. It supports data integration, cleaning, feature engineering, scalable model training, and deployment. SystemDS provides high-level declarative languages and hybrid execution plans optimized for both local and distributed environments, making it suitable for data scientists, engineers, and researchers across various domains 

Features

  • Declarative scripting languages with R-like and Python-like syntax.
  • Built-in functions for complex machine learning tasks.
  • Support for multiple execution modes: Standalone, Spark Batch, Spark MLContext, and JMLC.
  • Automatic optimization based on data and cluster characteristics.
  • Entity resolution primitives and customizable pipelines.
  • Integration with Java, Python, Hadoop, Spark, and CUDA.
  • Support for linear algebra, statistics, classification, clustering, and deep learning algorithms.

Capabilities

  • Hybrid execution plans combining in-memory and distributed operations.
  • Scalable from single-node to large Spark clusters.
  • Efficient use of CPU and GPU resources.
  • Custom algorithm development using DML scripting.
  • Flexible deployment via command-line, APIs, or embedded systems.
  • Extensible architecture for integrating new algorithms and data sources.
  • Supports structured and unstructured data processing.

Benefits

  • Accelerates development of machine learning workflows.
  • Reduces complexity with high-level scripting and automation.
  • Enhances performance through adaptive optimization.
  • Enables reproducible and maintainable data science pipelines.
  • Open-source and backed by the Apache Software Foundation.
  • Suitable for academic research, enterprise analytics, and production ML systems.
  • Active community and comprehensive documentation.