
A data lake is a centralized repository used for big data and AI computing. It allows you to store structured and unstructured data at any scale. Data Lake Formation (DLF) is a key component of the cloud-native data lake framework. DLF provides an easy way to build a cloud-native data lake. It seamlessly integrates with a variety of compute engines and allows you to manage the metadata in data lakes in a centralized manner and control enterprise-class permissions.
Vendor
Alibaba Cloud
Company Website




Overview
A data lake is a centralized repository used for big data and AI computing. It allows you to store structured and unstructured data at any scale. Data Lake Formation (DLF) is a key component of the cloud-native data lake framework. DLF provides an easy way to build a cloud-native data lake. It seamlessly integrates with a variety of compute engines and allows you to manage the metadata in data lakes in a centralized manner and control enterprise-class permissions.
Benefits
• Easy data collection: systematically collects structured, semi-structured, and unstructured data and supports massive data storage. • Flexible architecture: uses an architecture that separates computing from storage. You can plan resources on demand at low costs. This improves data processing efficiency to meet the rapidly changing business requirements. • Easy data management: uses unified data storage, separately stores cold and hot data, and manages data lifecycle. This solves a variety of O&M issues, such as the failure to copy data across clusters. • Easy value extraction: connects to different types of data computing and analytics platforms. This solves the data silo issues and provides insight into business value.
Features
- Data Ingestion Multiple data types and data ingestion channels Allows you to cleanse data in a centralized manner.
- Metadata Service Intelligent metadata discovery Collects metadata from different data sources to facilitate centralized management.
- Permission Management Enterprise-class data permission management Allows you to set permissions on databases, tables, and fields.
- Multi-engine Access Access to multiple upstream compute engines Helps you deploy an end-to-end data lake solution.
- Open Ecosystem Compatible with Hive metastore Provides APIs in multiple programming languages for easy integration.
- Data Acceleration JindoFS-based data acceleration Accelerates data lake analytics with high performance.