batch system

Examples

Batch system examples

We have made a few examples of how to use the batch system, both from the command line and through a batch submit file.

Overview of the Batch system examples subpages:

Slurm GPU Resources (Kebnekaise)

Slurm GPU Resources (Kebnekaise)

We have two types of GPU cards available on Kebnekaise, NVIDIA Tesla K80 (Kepler) and NVIDIA Tesla V100 (Volta).

To request GPU resources one has to include a GRES in the submit file. The general format is:

#SBATCH --gres=gpu:<type-of-card>:x

where <type-of-card> is either k80 or v100 and x = 1, 2, or 4 (4 only for the K80 type).

The K80 enabled nodes contain either two or four K80 cards, each K80 card contains two gpu engines.

Examples, srun

Examples, srun

You can submit programs directly to the batch system with 'srun', giving all the command on the command line, though you should normally use a jobscript and submit it with 'sbatch', for larger or more complicated jobs.

More information about parameters and job submission files can be found on the Slurm submit file design page.

Running two tasks, each on a different core.

$ srun -A <account> -n 2 my_program

Run 6 tasks distributed across 2 nodes.

Job Dependencies

Job dependencies - SLURM

A job can be given the constraint that it only starts after another job has finished.

In the following example, we have two Jobs, A and B. We want Job B to start after Job A has successfully completed.

First we start Job A by submitting it via sbatch:

$ sbatch <jobA.sh>

Making note of the assigned job ID for Job A, we then submit Job B with the added condition that it only starts after Job A has successfully completed:

Job Cancellation

Deleting a job

To cancel a job, use scancel. You need the running or pending jobid. It is only the job's owner and SLURM administrators that can cancel jobs.
$ scancel <jobid>

To cancel all your jobs (running and pending) you can run

$ scancel -u <username>

You get the job id when you submit the job.

$ sbatch -N 1 -n 4 submitfile
Submitted batch job 173079
$ scancel 173079

Or through squeue

Job Status

Job status

To see status of partitions and nodes, use

$ sinfo

To get the status of all SLURM jobs

$ squeue

To only view the jobs in the bigmem partition (Abisko)

$ squeue -p bigmem

To only view the jobs in the largemem partition (Kebnekaise)

$ queue -p largemem

Get the status of an individual job

$ scontrol show job <jobid>

Slurm MPI + OpenMP examples

Slurm MPI + OpenMP examples

This example shows a hybrid MPI/OpenMP job with 4 tasks and 48 cores per task, on Abisko.

#!/bin/bash
# Example with 4 tasks and 48 cores per task for MPI+OpenMP
#
# Project/Account
#SBATCH -A hpc2n-1234-56
#
# Number of MPI tasks
#SBATCH -n 4
#
# Number of cores per task
#SBATCH -c 48
#
# Runtime of this jobs is less then 12 hours.
#SBATCH --time=12:00:00
#

# Set OMP_NUM_THREADS to the same value as -c
# with a fallback in case it isn't set.
# SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set

Slurm OpenMP Examples

Slurm OpenMP Examples

This example shows a 48 core OpenMP Job (maximum size for one node on Abisko).

    #!/bin/bash
    # Example with 48 cores for OpenMP
    #
    # Project/Account
    #SBATCH -A hpc2n-1234-56
    #
    # Number of cores per task
    #SBATCH -c 48
    #
    # Runtime of this jobs is less then 12 hours.
    #SBATCH --time=12:00:00
    #

    # Set OMP_NUM_THREADS to the same value as -c
    # with a fallback in case it isn't set.
    # SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set

Pages

Updated: 2018-06-20, 11:30