Slurm

SLURM (Simple Linux Utility for Resource Management) ^[1] ^[5] is a powerful and flexible workload manager, designed for distributed computing environments. It is an open-source, highly-scalable system that provides advanced job scheduling, efficient job submission, and comprehensive system monitoring capabilities. SLURM allows users to manage computing resources efficiently, enabling more efficient and cost-effective research and data analysis. Overall, SLURM provides a highly efficient and flexible resource management solution for distributed computing environments. Its features make it an excellent choice for researchers and scientists looking to optimize their computing resources, streamline their workflows, and enhance their productivity.

Usual commands:

Purpose	Command
To check what queues (partitions) are available	sinfo
To submit job	sbatch
To view the queue status	squeue
To view the queue status of your job	squeue -u $USER
To cancel a running or pending job	scancel JOBID
To view detailed information of your job	scontrol show job JOBID
To View resource accounting information for finished and running jobs	sacct
To View resource accounting information for running jobs	sstat

Sample script

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
#       while those starting with "#" and "##SBATCH" are comments.  Uncomment
#       "##SBATCH" line means to remove one # and start with #SBATCH to be a
#       SLURM command or statement.

#SBATCH -J slurm_job # Slurm job name
#SBATCH -t 48:00:00  # time limit: (D-HH:MM or D-HH:MM:SS)

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=waipangsze@gmail.com # Update your email address
##SBATCH --mail-type=end           # NONE, BEGIN, END, FAIL, REQUEUE, ALL

# Choose partition (queue) to use. Note: replace <partition_to_use> with the name of partition
#SBATCH -p <partition_to_use>

# Use memory, nodes and cores
#SBATCH --mem=32000            # memory per node in MB 
#SBATCH --nodes=1              # number of nodes
#SBATCH --ntasks-per-node=16   # number of cores

# Setup runtime environment if necessary
# For example, setup intel MPI environment
module gnu8
# Setup runtime environment if necessary 
# or you can source ~/.bashrc or ~/.bash_profile 
# Go to the job submission directory and run your application
cd $HOME/apps/slurm
mpirun ./your_mpi_application

# or another work
module load intel/compiler/64/2017/17.0.0
module load intel/mkl/64/2017/0.098
module load intel/mpi/64/2017/0.098
cd /gpfs/projects/samples/intel_mpi_hello/
mpiicc mpi_hello.c -o intel_mpi_hello
mpirun ./intel_mpi_hello

# or GPU
#SBATCH -N 1 -n 8 -p GPU-V100 --gres=gpu:v100:2
./your-gpu-prog

You have to module what you need (compiler and correspoinding libraries)^[2].

Upon job completion, the standard output will be stored in the job submission directory under the filename slurm-.out or assign filename as

1 2	`#SBATCH --output=slurm.out # file to collect standard output #SBATCH --error=slurm.err # file to collect standard errors`

If the time limit is not specified in the submit script, SLURM will assign the default run time, 3 days. This means the job will be terminated by SLURM in 72 hrs. If the memory limit is not requested, SLURM will assign the default 16 GB. The maximum allowed memory per node is 128 GB. To see how much RAM per node your job is using, you can run commands sacct or sstat to query MaxRSS for the job on the node.

Query job information

#!/bin/bash
echo "==========================================================="
sacct --format=JobID,Jobname,start,end,elapsed,nnodes,ncpus,AveRSS,MaxRSS,MaxDiskRead,MaxDiskWrite,CPUTime,MaxVMSize
echo "==========================================================="
squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C %m %z %V %l"
squeue -o"%.7i %o"
echo "==========================================================="

sinfo

1	`sinfo -N -r -l`

squeue

sacct

scontrol

1	`scontrol show node -a <node_name>`

One more for check historical jobs,

1
2
3

alias sacct_job='sacct --format="JobId,JobName%50,Partition,Nodelist,Account,AllocCPUS,State,Submit,Start,End,Elapsed"'

sacct_job --starttime=2023-05-01

seff

The Slurm job efficiency report (seff)^[3]

Summary: seff takes a jobid and reports on the efficiency of that job's cpu and memory utilization. The rpm/tarball comes with an 'smail' utility that allows for Slurm end-of-job emails to include a seff report. This allows users to become aware if they are wasting resources.

1 2	`# This script is roughtly equivalent to: # sacct -P -n -a --format JobID,User,Group,State,Cluster,AllocCPUS,REQMEM,TotalCPU,Elapsed,MaxRSS,ExitCode,NNodes,NTasks -j <job_id>`

or copy from github SchedMd slurm seff

ulimit -s unlimited

1 2	`ulimit -l unlimited ulimit -s unlimited`

SLURM Job Dependencies

You may submit jobs that runs depending on status of the previously submitted jobs or schedule a bunch of jobs to run one after the other.^[4]

# Format is 
$ sbatch --dependency=type:job_id jobfile
$ sbatch --dependency=type:job_id,job_id,job_id jobfile # requires more than one job to be completed

jobID=$(sbatch first_job.sh)
sbatch --dependency=afterok:${jobID} second_job.sh

Submit a job to a specific node

1
2
3

sbatch --nodelist=node[01-09]
sbatch --nodelist=node01
sbatch --exclude=node[10-19] # Slurm sbatch exclude nodes or node list

--mem-per-cpu

https://researchcomputing.princeton.edu/support/knowledge-base/memory

In this case the correct action would be to decrease the required memory by using something like #SBATCH --mem-per-cpu=3G. It is wise to request more memory than you actually need for safety. Remember that the job will fail if it runs out of memory.

However, it is important not to request an excessive amount because that would make it harder for the job scheduler to schedule the job which results in longer queue times.

Parameters for OpenMP

The OMP_STACKSIZE environment variable controls the size of the stack for threads created by the OpenMP implementation.

1
2
3

# Parameters for OpenMP
export OMP_STACKSIZE=512m # the number is suffixed by B, K, M or G
export OMP_NUM_THREADS=1

References

Linux

#NWP #slurm #HPC

Slurm

https://waipangsze.github.io/2023/04/24/Slurm/

Author

wpsze

Posted on

April 24, 2023

Updated on

May 14, 2025

Licensed under

WRF | WRF installation on a Linux-based machine Previous

Setting up Blog on github.io Next