Diagnosing Torque / PBS Jobs

  • checkjob <JOBID>
show information about job <JOBID> (including which node(s) are reserved for it)
  • checkjob -v <JOBID>
if currently queued, show why job <JOBID> is not yet running on each node
memory reservation: “Dedicated Resources Per Task: PROCS: 1 MEM: 2048M”
number of processors: “Total Requested Tasks: 8”
in job script, look for “pmem=” to see what you requested for memory
  • qalter -l pmem=3g <JOBID>
change running job to reserve 3GB of memory for it
useful if you accidentally reserved too much memory when submitting
  • showq -w user=$USER
show all your jobs, whether currently running, queued up until resources become available,
or removed from queue (held) for some reason (see checkjob -v above)
  • tracejob -n <NUMDAYS> <JOBID> 2> /dev/null
get archived information about no longer running job <JOBID>
check logs over the past <NUMDAYS> days (by default, only checks current day’s logs)