
PAI-Lingjun Intelligent Computing Service is a PaaS service for large-scale deep learning and integrated intelligent computing. PAI-Lingjun Intelligent Computing Service provides both the Serverless Edition on the Alibaba Cloud public cloud and the Exclusive Edition. Based on the integrated optimization technology of software and hardware, PAI-Lingjun Intelligent Computing Service builds a high-performance heterogeneous computing base and AI capabilities for process engineering.
Vendor
Alibaba Cloud
Company Website


Overview
PAI-Lingjun Intelligent Computing Service is a PaaS service for large-scale deep learning and integrated intelligent computing. PAI-Lingjun Intelligent Computing Service provides both the Serverless Edition on the Alibaba Cloud public cloud and the Exclusive Edition. Based on the integrated optimization technology of software and hardware, PAI-Lingjun Intelligent Computing Service builds a high-performance heterogeneous computing base and AI capabilities for process engineering.
Benefits
PAI-Lingjun Intelligent Computing Service's core benefits include high performance, efficiency, and utilization to meet the requirements of high-performance computing for foundation model training, autonomous driving, scientific research, finance, etc.
- Serverless Lingjun Serverless Edition can help you quickly set up and run AI computing tasks. It manages complex heterogeneous systems based on automatic operations and maintenance (O&M), and seamlessly integrates with Alibaba Cloud computing, storage, and network services.
- High-Performance RDMA Network Alibaba Cloud’s high-performance Remote Direct Memory Access (RDMA) networks greatly accelerate AI training, with high-speed and low-latency transmission at 800 Gbit/s and GPU direct connection technologies that improve transmission stability and security.
- Efficient CPFS Storage System Cloud Paralleled File System (CPFS) uses a fully parallel storage architecture and supports POSIX/MPI-IO and Network File System (NFS) protocols. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, providing efficient and reliable storage services for AI training.
- Comprehensive AI Acceleration Our distributed training acceleration engine provides data set acceleration, computing acceleration, algorithm optimization, scheduling algorithms, and resource optimization. This ensures computing power is fully utilized, comprehensively improving the speed and efficiency of AI training and inference.
Features
Next-generation AI Computing Platform That Provides Large-scale AI Computing Power
Enterprise-Class AI Development Platform Full-process AI engineering capabilities such as AI development and AI training, with support for AI role management, and computing resource management One-Stop AI Computing Services You can activate and manage compute clusters, high-performance storage systems, container services, and AI development platforms with a few clicks, as well as perform lifecycles management, and quickly run AI computing tasks with fully automated O&M. Easy-to-Use Distributed Computing Foundation model training tasks can be distributed to run automatically and concurrently after simple configurations. The optimized computing, network, communication, and storage architectures can improve resource utilization and accelerate model training, significantly reducing the costs and time. Cluster Management You can quickly create clusters in the console or by calling API operations, monitor clusters, troubleshoot errors of hosts and services in a visualized manner with a wide range of monitoring metrics, events, and statistics. You can also perform root cause analysis and performance tuning with associated diagnostics and analysis tools for hosts, networks, and tasks. RDMA Network High-performance RDMA computing, storage, and control networks enable high-performance and high-availability access to Alibaba Cloud services, with features including strong security isolation, minute-level deployment, continuous acceleration, and high reliability. High-Performance Storage The parallel I/O architecture improves storage performance. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, and can communicate with cloud and on-premises storage systems.