Name: TORQUE Resource Manager
Brand: Adaptive Computing

TORQUE Resource ManagerAdaptive Computing

TORQUE is a scalable, fault-tolerant resource manager for HPC environments, supporting job control, scheduling, and system optimization.

Vendor

Adaptive Computing

Company Website

https://adaptivecomputing.com/cherry-services/torque-resource-manager/

Product details

TORQUE Resource Manager is an industry-standard solution for managing batch jobs and distributed computing resources in high-performance computing (HPC) environments. Adaptive Computing’s fully developed version of TORQUE (currently at version 7.0.0) includes support for multiple operating systems, enhanced fault tolerance, scalability, and integration with Moab® Workload Manager. TORQUE is used globally across government, academic, and commercial sites to optimize application performance and system utilization.

Features

OS Compatibility: Supports numerous Ubuntu versions, Red Hat 8, and SUSE 15.
MIG Support: Includes support for Multi-Instance GPU environments.
Extensive Testing: Validated with tens of thousands of tests across supported OS versions.
Fault Tolerance:
- Additional failure condition checks.
- Node health check script support.
Scheduling Interface:
- Extended query and control interfaces for improved scheduler interaction.
- Job statistics collection for completed jobs.
Scalability:
- Enhanced server-to-MOM communication model.
- Supports clusters with tens of thousands of nodes and jobs.
- Handles jobs spanning hundreds of thousands of processors.
- Multi-threading and TCP-based communication for high responsiveness.
Usability:
- Extensive logging improvements.
- Human-readable error messages.
Modular Add-ons:
- Portal-based job submission.
- Accounting and grid management.
- Power management and high-throughput submission.

Benefits

Ease of Use: Simplifies job submission with portals, templates, script builders, and web-based file management.
Customizability: Adapts to specific system configurations and organizational needs.
High Adoption: Widely used across global HPC installations.
Improved Reliability: Robust fault tolerance and health monitoring.
Enhanced Scheduler Control: Provides detailed job data and control interfaces.
Scalable Architecture: Efficiently manages large-scale clusters and workloads.
Operational Efficiency: Reduces administrative overhead and improves system performance.

Documents

TORQUE-Resource-Manager-Data-Sheet.pdf

Find more products by category

Other Analytics Software Process Automation Software DevOps Software View all