Moab / Torque Primer: A Gentle Introduction to Job Management
Moab and Torque are two software packages that work closely together and are used in combination at many HPC sites:
- Torque is an Open Source resource manager which is responsible for collecting status and health information from compute nodes and keeps track of jobs running in the system. It is also responsible for spawning the actual executables that are associated with a jobs, e.g. running the executable on the corresponding compute node.
- Moab
is a commercial scheduler product developed by
Cluster Resources Inc.Adaptive Computing which is responsible for allocating resources to jobs that are requesting resources. It does so by collecting all the information that Torque (or other resource managers) can provide about currently running jobs, available nodes and other resources. Once Moab has scheduled resources for a job, it instructs Torque to execute the job on the allocated resources.
This collection of articles is intended as a very basic introduction to resource and job management with Moab and Torque. It provides a high-level practical starting point for new HPC system administrators who want to become more familiar with these systems. It is not intended to provide comprehensive and detailed descriptions of all of the systems’ features or their configuration.
The articles are organized in parts which will be posted as they become available:
- Submitting Jobs
- Resources
- Queues
- Reservations
- Job Status
- Modifying Jobs
- Canceling Jobs