Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: better gpu docs

Step-by-step guide

...

Most of the GPU equipped compute nodes on Talapas have 4 GPUs per node (see Machine Specifications). Suppose we only needed one GPU for our simulation, then our job script may take the (new) form

Code Block
languagebash
#!/bin/bash
#SBATCH --job-name=GPUjob     ### Job Name
#SBATCH --partition=gpu       ### Similar to a queue in PBS
#SBATCH --time=1-00:00:00     ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1             ### Node count required for the job
#SBATCH --ntasks-per-node=1   ### Nuber of tasks to be launched per Node
#SBATCH --gresgpus=gpu:1              ### General REServation of gpu:number of gpus
#SBATCH --account=<myPIRG>    ### Account used for job submission

my_executable $SLURM_JOB_GPUS

In this example, the program my_executable expects the GPU ordinal as an input. We use of the variable SLURM_JOB_GPUS to pass that information from SLURM without knowing apriori which GPU I will run on.

SLURM_JOB_GPUS is a list of the ordinal indexes of the gpus assigned to my job by slurm. With the request of a single GPU, this variable will store a single numeral from 0 to 3 (we have 4 GPUs on each node). If I wanted to use two GPUs, I would change gres=gpu:1 to gres=gpu:2, and then SLURM_JOB_GPUS would store a list of the form 0,1 (for example).

NOTE:  The SLURM_JOB_GPUS variable is not set when using srun. The variable GPU_DEVICE_ORDINAL is set when using srun or sbatch and can be used instead.

There is also an old form that is similar.  For most purposes, the new form is what you should use.  Note, however, that with the new form, it's very important to specify '--nodes=1' if you need all of the GPUs to be allocated on a single node.

Code Block
languagebash
#!/bin/bash
#SBATCH --job-name=GPUjob     ### Job Name
#SBATCH --partition=gpu       ### Similar to a queue in PBS
#SBATCH --time=1-00:00:00     ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1             ### Node count required for the job
#SBATCH --ntasks-per-node=1   ### Nuber of tasks to be launched per Node
#SBATCH --gres=gpu:1          ### General REServation of gpu:number of gpus
#SBATCH --account=<myPIRG>    ### Account used for job submission

my_executable $SLURM_JOB_GPUS


GPU types

Currently, the 'gpu' and 'longgpu' partitions contain only NVidia K80 GPUs.  Thus, you don't need to explicitly specify a GPU type.  Similarly, most other partitions contain only one kind of GPU.

The notable exception is the 'preempt' partition, which contains most Talapas nodes.  The easiest way to see the available types is by running the command '/packages/racs/bin/slurm-show-gpus'.

Alternatively, you can use node features (with the '--constraint' flag) to limit the possibilities.  See '/packages/racs/bin/slurm-show-features'.

Filter by label (Content by label)
showLabelsfalse
max5
spacesTCP
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ( "slurm" , "submitting" , "jobs" , "gpu" ) and type = "page" and space = "TCP"
labelsslurm jobs submitting gpu

...