Logo
Sign in
Product Logo
TORQUE Resource ManagerAdaptive Computing

TORQUE is a scalable, fault-tolerant resource manager for HPC environments, supporting job control, scheduling, and system optimization.

Vendor

Vendor

Adaptive Computing

Company Website

Company Website

ccvdk339.png
TORQUE-Res…Data-Sheet.pdf
Product details

TORQUE Resource Manager is an industry-standard solution for managing batch jobs and distributed computing resources in high-performance computing (HPC) environments. Adaptive Computing’s fully developed version of TORQUE (currently at version 7.0.0) includes support for multiple operating systems, enhanced fault tolerance, scalability, and integration with Moab® Workload Manager. TORQUE is used globally across government, academic, and commercial sites to optimize application performance and system utilization.

Features

  • OS Compatibility: Supports numerous Ubuntu versions, Red Hat 8, and SUSE 15.
  • MIG Support: Includes support for Multi-Instance GPU environments.
  • Extensive Testing: Validated with tens of thousands of tests across supported OS versions.
  • Fault Tolerance:
    • Additional failure condition checks.
    • Node health check script support.
  • Scheduling Interface:
    • Extended query and control interfaces for improved scheduler interaction.
    • Job statistics collection for completed jobs.
  • Scalability:
    • Enhanced server-to-MOM communication model.
    • Supports clusters with tens of thousands of nodes and jobs.
    • Handles jobs spanning hundreds of thousands of processors.
    • Multi-threading and TCP-based communication for high responsiveness.
  • Usability:
    • Extensive logging improvements.
    • Human-readable error messages.
  • Modular Add-ons:
    • Portal-based job submission.
    • Accounting and grid management.
    • Power management and high-throughput submission.

Benefits

  • Ease of Use: Simplifies job submission with portals, templates, script builders, and web-based file management.
  • Customizability: Adapts to specific system configurations and organizational needs.
  • High Adoption: Widely used across global HPC installations.
  • Improved Reliability: Robust fault tolerance and health monitoring.
  • Enhanced Scheduler Control: Provides detailed job data and control interfaces.
  • Scalable Architecture: Efficiently manages large-scale clusters and workloads.
  • Operational Efficiency: Reduces administrative overhead and improves system performance.