See How to use Linux Servers for details.
Running a batch job in SLURM involves two steps:
- Creating a script to run the job. The script includes the actual command to run, environment variables and other settings just like a normal shell script, and some SLURM-specific settings which start with “#SBATCH”.
- Submitting the job to the SLURM scheduler. This will then queue up the script to be run when enough nodes are free.
You can also run an interactive job where your terminal session will retain control of the job. This is normally used to log into a node. If you run a shell like “bash” in an interactive job then you will have a bash shell on a node which is very similar to logging in to that node. Note that the scheduler still handles these interactive jobs so you still need to request any resources you may need if you need more than the default.
In general you should not run data analysis jobs on the login node itself. Please do not run any computationally-intensive processes on the login node which will take more than a few hours. The login node can be used for short-term processes like compiling software or installing packages with conda.
Setting Up a Job
Here is a sample SLURM script to get you going. Instead of manually running a binary, this would be how you run the binary in batch mode within SLURM. Look below for other example scripts.
The most essential commands to learn are sbatch and srun. The manual page will show you all the options, but at the very minimum you need to create a job script and submit it with srun scriptname. This will run a serial single-task job script as soon as a slot is available, and use your existing console session for output. If you want to queue up the job, use sbatch scriptname. This will queue up the job and run is as soon as a slot is available, sending output to files. Inside the batch script, you can run multiple srun commands to distribute individual tasks among the resources (memory, CPU cores) reserved for that batch job.
For parallel jobs using MPI as inter-process communication, use mpirun to run the computation application from within your batch script. OpenMPI is tightly integrated with SLURM, so you can eliminate the -np and –hostfile options to mpirun. It will automatically know how many tasks and on which nodes to run based on what nodes SLURM assigned to your job.
Interactive session within your group’s partition
For an interactive session, simply use srun -n 1 –pty bash (note: there are two dashes before “pty”, web browsers tend to render this wrong). When a CPU is available in your group’s partition, this command will launch a shell on one of your dedicated nodes. Anything you run inside that shell is running on the compute node itself, not on the login node.
Keep in mind that as long as your interactive session is active, you are using up resources that might otherwise be available for other batch jobs in your group. Remember to exit out of the interactive shell when you are done!
In general, debug interactively first, before running large job. This saves you from wasted time and effort, not to mention frustration. If you submit a job and you find the scheduler complaining about “unavailable resources”, make sure your job’s expected runtime (start time + duration) does not overlap a scheduled cluster outage. Check for announcements from the system administrators regarding such outages.
Here is a list of SLURM commands that can help diagnose problems or monitor the status of a job.
- sinfo (see sinfo help)
- sinfo -o “%20N %10T %5c %8m %16f %40G” -p partition_name
- show status of all nodes within your group’s partition
- squeue -u USERNAME
- show jobs queued for $USERNAME
- scontrol –details show job JOBID
- show detailed information about job $JOBID
To find information on specific nodes, you can use this variant:
- sinfo -n comma_separated_nodenames -N –long -p partition_name
- show extended information about individual nodes
scancel jobid will stop and remove a currently running job.
For the most common options see this sample SLURM script.
Your needs may vary, and depending on the configuration of your cluster you might need to use some advanced SLURM options to ensure optimal usage of the cluster for your application.
If you are doing complex multistep procedures, or need to store data locally in /scratch-local and need to clean up after yourself, you may run into issues if your job is manually canceled or terminated for other reasons. Check out this SLURM batch script with scratch storage handling for ideas.
Still need help?
See also How to use Linux Servers for more general assistance.
The SLURM developers provide excellent documentation at https://slurm.schedmd.com
See what other SLURM users have to say at http://groups.google.com/group/slurm-users