
DataWorks is a turnkey platform that provides professional, efficient, secure, and reliable big data development and governance services based on big data compute engines such as MaxCompute, E-MapReduce (EMR), and Hologres. DataWorks integrates the best practices of Alibaba data mid-end and data governance to support digital transformation in various industries. Tens of thousands of data and algorithm engineers in Alibaba Group use DataWorks every day to undertake the construction of 99% of data business for Alibaba Group.
Vendor
Alibaba Cloud
Company Website


Overview
DataWorks is a turnkey platform that provides professional, efficient, secure, and reliable big data development and governance services based on big data compute engines such as MaxCompute, E-MapReduce (EMR), and Hologres. DataWorks integrates the best practices of Alibaba data mid-end and data governance to support digital transformation in various industries. Tens of thousands of data and algorithm engineers in Alibaba Group use DataWorks every day to undertake the construction of 99% of data business for Alibaba Group.
Benefits
- Development Visualization You can drag and drop nodes to create a workflow. You can also edit and debug your code online, and ask other developers to join you.
- Multiple Task Types Supports data integration, MaxCompute SQL, MaxCompute MR, machine learning, and shell tasks.
- Strong Scheduling Capability Runs millions of tasks concurrently and supports hourly, daily, weekly, and monthly schedules.
- Task Monitoring and Alarms Supports task monitoring and sends alarms when errors occur to avoid service interruptions.
Features
Best Platform for Building Big Data Warehouses
DataWorks is the best platform for building big data warehouses, and provides comprehensive data warehousing services.
- A comprehensive solution that covers all aspects of data warehousing DataWorks provides a full solution for data aggregation, data processing, data governance, and data services. The involved features include data integration, data development, data quality, data protection, and data services.
- Separate production and development environment Provides a separate production and development environment. You can debug the code in the development project before releasing your application to the production environment. This ensures the stability and security of your service.
- End-to-end platform Provides an end-to-end platform for all development and debugging tasks, eliminating the need to switch between different tools.
- Secure and reliable Provides the basic security mechanisms. Allows you to manage your data with multiple permission settings. You can drag and drop nodes to create a workflow. You can also edit and debug your code online, and ask other developers to join you.
Stable and Efficient Scheduling System
Schedules millions of tasks concurrently to ensure the stability of your service.
- Stable and reliable Provides a unified task scheduling platform and supports the scheduling of millions of tasks to streamline data processing.
- Visualized management Provides a DAG-based visual interface.
- Supports multiple scheduling periods Allows you to create minutely, hourly, daily, weekly, and monthly schedules.
- Monitoring and alarms Allows you to create different types of alarms to monitor task status.
Collaborative Development Capability
Provides multiple roles and permissions so that users can collaborate efficiently.
- Manage multiple roles Allows you to manage multiple roles such as administrators, developers, maintenance personnel, and visitors. This reduces management costs.
- Collaborative development Provides the version management capability and lock mechanism. This enables multiple developers to participate in collaborative development.
Powered by Strong Compute and Storage Capability of MaxCompute
MaxCompute provides exabyte scale storage and strong compute capability.
- Large-scale computing and storage Meets 100 GB or above storage and compute needs, with a maximum of exabyte scale storage.
- High stability The service has been running stably in Alibaba Cloud internal systems for more than three years and meets most of our offline analysis needs. Supports more than one hundred thousand computing tasks and processes hundreds of PB of data every day.
- Significantly Lowers the TCO More cost-effective than your private clouds. Better compute and storage capability can reduce the hardware investment costs by 20 to 30 percent.
- Secure and reliable Multiple sandbox protections and the monitoring system can effectively ensure the security of your data.
Big Data Security Management
Big Data Security Management provides features such as data asset identification, sensitive data identification, data classification and masking, data access monitoring, risk early warning, and risk auditing.
- Sensitive data identification Based on machine learning algorithms, this feature automatically identifies sensitive data in your system and visually displays the types, distribution, and amount of the data. Custom data types can also be identified.
- Accurate data classification This feature allows you to perform information classification and create custom data types for better data management.
- Flexible data masking This feature provides multiple ways to mask data. Both static and dynamic data masking are supported.
- Risk monitoring and auditing of suspicious operations This feature uses multi-dimensional correlation analysis and algorithms to identify exceptions and suspicious operations, and provides the early warning and auditing visualization feature.