How to use MP Clusters (SLURM)

Logging in

See How to use Linux Servers for details.

Running Jobs

Running a batch job in SLURM involves two steps:

Creating a script to run the job. The script includes the actual command to run, environment variables and other settings just like a normal shell script, and some SLURM-specific settings which start with “#SBATCH”.
Submitting the job to the SLURM scheduler. This will then queue up the script to be run when enough nodes are free.

You can also run an interactive job where your terminal session will retain control of the job. This is normally used to log into a node. If you run a shell like “bash” in an interactive job then you will have a bash shell on a node which is very similar to logging in to that node. Note that the scheduler still handles these interactive jobs so you still need to request any resources you may need if you need more than the default.

In general you should not run data analysis jobs on the login node itself. Please do not run any computationally-intensive processes on the login node which will take more than a few hours. The login node can be used for short-term processes like compiling software or installing packages with conda.

Setting Up a Job

Here is a sample SLURM script to get you going. Instead of manually running a binary, this would be how you run the binary in batch mode within SLURM. This script is a standard script that can use any interpreter that you can run on the command line interactively. Most popular are the bash shell (#!/bin/bash) and Python (#!/usr/bin/python).

In addition to standard interpreter commands, there is also an optional SLURM header section that requests specific resources from SLURM. These SLURM header options look the same as those you might provide on the command line to the sbatch command. The header section ends at the first non-comment, non-empty line in your script. See the sample script above for a thorough example.

Submitting Jobs

sbatch
srun

The most essential commands to learn are sbatch and srun. The man page will show you all the options, but at the very minimum you need to create a job script and submit it with srun scriptname. This will run a serial single-task job script as soon as a slot is available, and use your existing console session for output. If you want to queue up the job, use sbatch scriptname. This will queue up the job and run is as soon as a slot is available, sending output to files. Inside the batch script, you can run multiple srun commands to distribute individual tasks among the resources (memory, CPU cores) reserved for that batch job.

Interactive session within your group’s partition

salloc
srun

Starting with SLURM 23.11 the recommended method for starting an interactive session is the salloc command. When a CPU is available in your group’s partition, this command will launch a shell on one of your dedicated nodes. Anything you run inside that shell is running on the compute node itself, not on the login node.

If you need to request GPUs or more resources than the default add them as flags to that command, e.g. salloc –mem=4G –gres=gpu:1 to request 4GB of memory and one GPU (note: there are two dashes before “mem” and “gres” but web browsers tend to render this wrong). These are the same flags you would put on the #SBATCH lines in a SLURM batch file.

An older method for starting an interactive session is use a command like srun -n 1 –pty bash (note: there are two dashes before “pty”). This is similar to salloc but is now deprecated by the SLURM developers. If you need to request more resources than the default you can add flags like “–mem=32G” to this command as well.

Keep in mind that as long as your interactive session is active, you are using up resources that might otherwise be available for other batch jobs in your group. Remember to exit out of the interactive shell when you are done!

Diagnosing jobs

In general, debug interactively first, before running large job. This saves you from wasted time and effort, not to mention frustration. If you submit a job and you find the scheduler complaining about “unavailable resources”, make sure your job’s expected runtime (start time + duration) does not overlap a scheduled cluster outage. Check for announcements from the system administrators regarding such outages.

Here is a list of SLURM commands that can help diagnose problems or monitor the status of a job.

sinfo (see sinfo help)
- sinfo -o “%20N %10T %5c %8m %16f %40G” -p partition_name
  
  show status of all nodes within your group’s partition

squeue
- squeue -u USERNAME
  
  show jobs queued for $USERNAME

scontrol
- scontrol –details show job JOBID
  
  show detailed information about job $JOBID

To find information on specific nodes, you can use this variant:

sinfo
- sinfo -n comma_separated_nodenames -N –long -p partition_name
  
  show extended information about individual nodes

Deleting jobs

scancel jobid will stop and remove a currently running job.

Advanced Usage

For the most common options see this sample SLURM script.

Your needs may vary, and depending on the configuration of your cluster you might need to use some advanced SLURM options to ensure optimal usage of the cluster for your application.

If you are doing complex multistep procedures, or need to store data locally in /scratch-local and need to clean up after yourself, you may run into issues if your job is manually canceled or terminated for other reasons. Check out this SLURM batch script with scratch storage handling for ideas.

Finally, if you are trying to set up rootless Docker under SLURM see this page for more info on how to do that.

Still need help?

See also How to use Linux Servers for more general assistance.

The SLURM developers provide excellent documentation at https://slurm.schedmd.com

See what other SLURM users have to say at http://groups.google.com/group/slurm-users