jacked.in

Moab / Torque Primer Part 2: Resources

The main purpose of Torque and Moab is to manage, monitor and schedule resources. At a very high level the most obvious types of resources are compute nodes. Each of these compute nodes, however, contributes resources at a much finer level to the resource pool and resources monitors such as Torque often monitor processors, main memory, storage, etc. In addition to these rather common resource types, nodes may also be associated with other more specific resource types, such as network bandwidth, software licenses or power consumption. However, for a general understanding considering compute nodes and processors as resources is sufficient.

Torque is responsible for monitoring the resources provided by compute nodes. Each compute node runs a small daemon program called pbs_mom which collects all relevant resource information on the compute node it is running on and sends this information to Torque’s pbs_server which typically runs on the headnode of the cluster. The pbs_server then stores this information in an internal database which can be used by schedulers such as Moab to get real-time information about the availability of resources.

Torque provides access to its list of available resources through the pbsnodes command. Without any command-line arguments the command lists all nodes that Torque knows about and the resources it monitors for each node:

$ pbsnodes
compute-0-0.local
     state = job-exclusive
     np = 8
     ntype = cluster
     jobs = 0/45.cluster.local, 1/46.cluster.local, 2/47.cluster.local, 3/48.cluster.local, 4/49.cluster.local, 5/50.cluster.local, 6/51.cluster.local, 7/52.cluster.local
     status = opsys=linux,uname=Linux compute-0-0.local 2.6.9-42.ELsmp #1 SMP Tue Aug 15 10:35:26 BST 2006 x86_64,sessions=9878 9881 9882 9883 9889 9918 9924 9927,nsessions=8,nusers=1,idletime=16682405,totmem=12258160kb,availmem=10771408kb,physmem=8161596kb,ncpus=8,loadave=9.43,netload=1074777221096,state=free,jobs=31.cluster.local 46.cluster.local 45.cluster.local 47.cluster.local 49.cluster.local 48.cluster.local 50.cluster.local 51.cluster.local 52.cluster.local,varattr=,rectime=1250282323

...

compute-0-3.local
     state = free
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-3.local 2.6.9-42.ELsmp #1 SMP Tue Aug 15 10:35:26 BST 2006 x86_64,sessions=? 0,nsessions=? 0,nusers=0,idletime=27314517,totmem=12258140kb,availmem=11926516kb,physmem=8161576kb,ncpus=8,loadave=0.00,netload=4289316359815,state=free,jobs=972217.cluster.local,varattr=,rectime=1250282329

For each node it lists the node’s state, the number of processors provided by the node, as well as other attributes such as available memory or jobs currently executed on the node. In most cases the default pbsnodes output is too verbose and a number of command-line switches can help filtering the information:

In addition to reporting node states, pbsnodes may be used to set a node state manually. This is useful for maintenance or debugging of hard-/software problems. pbsnodes -o [nodename] is used to mark a node as offline. This allows jobs that may currently be running on this node to complete, but no new jobs will be allocated to the node. In Moab terminology the node is considered to be drained if it is marked as offline. Additionally, when putting a node into an offline state, a comment or note should always be provided, such that the reason for taking the node offline can be easily determined. The user-defined comment can be set with the -N command line option:

$ pbsnodes -o -N "installing new hard-drive" compute-0-5.local
$ pbsnodes -l -n
compute-0-5.local   offline                    installing new hard-drive

Once a node can be placed back into the pool of available resources its offline state needs to be cleared. This is done by using the -c option to pbsnodes. At the same time the comment can be removed from the node using -N "":

$ pbsnodes -c -N "" compute-0-5.local
$ pbsnodes -l -n free
...
compute-0-5.local    free
...
$ pbsnodes compute-0-5.local
compute-0-5.local
     state = free
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-5.local 2.6.9-42.ELsmp #1 SMP Tue Aug 15 10:35:26 BST 2006 x86_64,sessions=18901,nsessions=1,nusers=1,idletime=36154599,totmem=12258140kb,availmem=11888652kb,physmem=8161576kb,ncpus=8,loadave=0.01,netload=3655128653370,state=free,varattr=,rectime=1250283674

In addition to Torque’s tools to manage nodes, Moab also provides tools to gain an insight into Moab’s view at the nodes and the resources they provide. This is done using the checknode command.

The checknode command is only used to report information about a node and cannot be used to modify any of the node’s parameters. A typical command output for a busy node looks like:

$ checknode compute-0-3.local
node compute-0-3.local
State:      Busy  (in current state for 00:32:26)
Configured Resources: PROCS: 8  MEM: 7970M  SWAP: 11G  DISK: 1M
Utilized   Resources: PROCS: 8  SWAP: 1189M
Dedicated  Resources: PROCS: 8
  MTBF(longterm):   INFINITY  MTBF(24h):   INFINITY
Vars:       RACK,SLOT,Rack
Opsys:      linux     Arch:      ---   
Speed:      1.00      CPULoad:   8.080
Network Load: 10.35 kB/s
Flags:      rmdetected
Network:    DEFAULT
Classes:    [queue1 8:8][queue2 0:8][queue3 8:8]
RM[cluster] TYPE=PBS  STATE=Busy
EffNodeAccessPolicy: SHARED

Total Time: 78:12:29:09  Up: 78:12:04:28 (99.98%)  Active: 13:20:15:23 (17.63%)

Reservations:
  res1.609x8  User  9:51:12 -> 11:51:12 (2:00:00)
    Blocked Resources@9:51:12     Procs: 8/8 (100.00%)  Mem: 0/7970 (0.00%)  Swap: 0/11970 (0.00%)  Disk: 0/1 (0.00%)
  66x8  Job:Running  -00:32:26 -> 1:27:34 (2:00:00)
  res1.614x8  User  1:09:51:12 -> 1:11:51:12 (2:00:00)
    Blocked Resources@1:09:51:12  Procs: 8/8 (100.00%)  Mem: 0/7970 (0.00%)  Swap: 0/11970 (0.00%)  Disk: 0/1 (0.00%)
Jobs:        66

The output provides a number of details about the compute node including information on what resources are available on the node (Configured), which of those are currently in use (Utilized), and which resources have been dedicated to a job (Dedicated). Note that a node may show a utilization of resources that are not explicitly dedicated. In the case above, the resources that are dedicated to a job are 8 CPUs, however, the node also reports a utilization of 1189 MB of swap space. In most cases this is ok until the utilized resources reach the limit of physically available resources or the load of the system dramatically exceeds the number of available processors. In the ideal case a job should always only utilize at most the resources that are dedicated to it and if this is not the case the job’s owner should be informed to adjust the resources requested by any subsequent similar jobs.

The checknode output additionally provides an overview of any current and future resource reservations. The example above shows reservations for 2 future standing reservations (res1) and one current reservation for a job (66). One can identify current reservations by their start time being negative.

Aside of typical computation resources such as CPU, main memory and disk, Torque and Moab allow user-defined resources to be associated with nodes. Such resources can for example be special hardware devices, a limited number of software licenses or application specific resource measures (e.g. Dreamhost’s Conueries). These kinds of resources are typically defined through the NODECFG directive in Moab’s configuration file.

Comments

comments powered by Disqus