Step-by-step guide
...
Most of the GPU equipped compute nodes on Talapas have 4 GPUs per node (see Machine Specifications). Suppose we only needed one GPU for our simulation, then our job script may take the (new) form
Code Block | ||
---|---|---|
| ||
#!/bin/bash #SBATCH --job-name=GPUjob ### Job Name #SBATCH --partition=gpu ### Similar to a queue in PBS #SBATCH --time=1-00:00:00 ### Wall clock time limit in Days-HH:MM:SS #SBATCH --nodes=1 ### Node count required for the job #SBATCH --ntasks-per-node=1 ### Nuber of tasks to be launched per Node #SBATCH --gresgpus=gpu:1 ### General REServation of gpu:number of gpus #SBATCH --account=<myPIRG> ### Account used for job submission my_executable $SLURM_JOB_GPUS |
In this example, the program my_executable
expects the GPU ordinal as an input. We use of the variable SLURM_JOB_GPUS
to pass that information from SLURM without knowing apriori which GPU I will run on.
SLURM_JOB_GPUS
is a list of the ordinal indexes of the gpus assigned to my job by slurm. With the request of a single GPU, this variable will store a single numeral from 0 to 3 (we have 4 GPUs on each node). If I wanted to use two GPUs, I would change gres=gpu:1
to gres=gpu:2
, and then SLURM_JOB_GPUS
would store a list of the form 0,1 (for example).
NOTE: The SLURM_JOB_GPUS
variable is not set when using srun
. The variable GPU_DEVICE_ORDINAL
is set when using srun
or sbatch
and can be used instead.
There is also an old form that is similar. For most purposes, the new form is what you should use. Note, however, that with the new form, it's very important to specify '--nodes=1' if you need all of the GPUs to be allocated on a single node.
Code Block | ||
---|---|---|
| ||
#!/bin/bash
#SBATCH --job-name=GPUjob ### Job Name
#SBATCH --partition=gpu ### Similar to a queue in PBS
#SBATCH --time=1-00:00:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Node count required for the job
#SBATCH --ntasks-per-node=1 ### Nuber of tasks to be launched per Node
#SBATCH --gres=gpu:1 ### General REServation of gpu:number of gpus
#SBATCH --account=<myPIRG> ### Account used for job submission
my_executable $SLURM_JOB_GPUS |
GPU types
Currently, the 'gpu' and 'longgpu' partitions contain only NVidia K80 GPUs. Thus, you don't need to explicitly specify a GPU type. Similarly, most other partitions contain only one kind of GPU.
The notable exception is the 'preempt' partition, which contains most Talapas nodes. The easiest way to see the available types is by running the command '/packages/racs/bin/slurm-show-gpus'.
Alternatively, you can use node features (with the '--constraint' flag) to limit the possibilities. See '/packages/racs/bin/slurm-show-features'.
Related articles
Filter by label (Content by label) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...