How to use MP Clusters (SLURM)

Logging in

See How to use Linux Servers for details.

Running Jobs

Setting Up a Job

Here is a sample SLURM script to get you going. Instead of manually running a binary, this would be how you run the binary in batch mode within SLURM. Look below for other example scripts.

Submitting Jobs

  • sbatch
  • srun

The most essential commands to learn are sbatch and srun. The manual page will show you all the options, but at the very minimum you need to create a job script and submit it with srun scriptname. This will run a serial single-task job script as soon as a slot is available, and use your existing console session for output. If you want to queue up the job, use sbatch scriptname. This will queue up the job and run is as soon as a slot is available, sending output to files. Inside the batch script, you can run multiple srun commands to distribute individual tasks among the resources (memory, CPU cores) reserved for that batch job.

  • mpirun

For parallel jobs using MPI as inter-process communication, use mpirun to run the computation application from within your batch script. OpenMPI is tightly integrated with SLURM, so you can eliminate the -np and –hostfile options to mpirun. It will automatically know how many tasks and on which nodes to run based on what nodes SLURM assigned to your job.

Interactive session within your group’s partition

For an interactive session, simple use srun -n 1 –pty bash. When a CPU is available in your group’s partition, this command will launch a shell on one of your dedicated nodes. Anything you run inside that shell is running on the compute node itself, not on the login node.

Keep in mind that as long as your interactive session is active, you are using up resources that might otherwise be available for other batch jobs in your group. Remember to exit out of the interactive shell when you are done!

Diagnosing jobs

In general, debug interactively first, before running large job. This saves you from wasted time and effort, not to mention frustration. If you submit a job and you find the scheduler complaining about “unavailable resources”, make sure your job’s expected runtime (start time + duration) does not overlap a scheduled cluster outage. Check for announcements from the system administrators regarding such outages.

Here is a list of SLURM commands that can help diagnose problems or monitor the status of a job.

  • sinfo (see sinfo help)
    • sinfo -o “%20N %10T %5c %8m %16f %40G” -p partition_name
      show status of all nodes within your group’s partition
  • squeue
    • squeue -u USERNAME
      show jobs queued for $USERNAME
  • scontrol
    • scontrol –details show job JOBID
      show detailed information about job $JOBID

To find information on specific nodes, you can use this variant:

  • sinfo
    • sinfo -n comma_separated_nodenames -N –long -p partition_name
      show extended information about individual nodes

Deleting jobs

scancel jobid will stop and remove a currently running job.

Advanced Usage

For the most common options see this sample SLURM script.

Your needs may vary, and depending on the configuration of your cluster you might need to use some advanced SLURM options to ensure optimal usage of the cluster for your application.

If you are doing complex multistep procedures, or need to store data locally in /scratch-local and need to clean up after yourself, you may run into issues if your job is manually canceled or terminated for other reasons. Check out this SLURM batch script with scratch storage handling for ideas.

Still need help?

See also How to use Linux Servers for more general assistance.

The SLURM developers provide excellent documentation at https://slurm.schedmd.com

See what other SLURM users have to say at http://groups.google.com/group/slurm-users