Table of Contents | ||
---|---|---|
|
General Principles
Jobs that run on multiple nodes generally use a parallel programming API called MPI (Message Passing Interface), which allows processes on multiple nodes to communicate with high throughput and low latency (especially over Talapas' InfiniBand network). MPI is a standard and has multiple implementations—several are available on Talapas, notably Intel MPI and MPICH.
...
Code Block | ||
---|---|---|
| ||
#SBATCH --partition=compute #SBATCH --ntasks=500 #SBATCH --ntasks-per-core=1 #SBATCH --mem-per-cpu=500m #SBATCH --constraint=7713 |
With this approach is that the job will probably be scheduled sooner , since Slurm is free to use any available cores, rather than having to arrange for nodes with sufficient free cores to become available. It’s . It’s recommended to keep the job tied to cores of the same type through use of the Slurm’s --constraint
flag.
Another method is to specify the number nodes you want the job to run on and how many tasks to run on each node.
...
See sinfo -o "%10R %8D %25N %8c %10m %40f %35G"
for a complete list of nodes properties and features relative to the their partitions.
...
Also see /packages/racs/bin/slurm-show-features
Memory
For single-node jobs use the Slurm --mem
flag to specify the entire amount of memory to allocate to the job will be allocated.
For multi-node jobs use the Slurm --mem-per-cpu
flag to specify the amount of memory available to allocate to each individual task.
Code Block | ||
---|---|---|
| ||
#SBATCH --mem-per-cpu=8G |
...
Slurm Invocation
Slurm provides two slightly different ways to invoke your MPI program.
...
See the Slurm MPI guide for more information.
MPI compilers
Intel
...
To access the Intel OneAPI MPI compilers, such as, mpicc
or mpiifort
:
Code Block | ||
---|---|---|
| ||
module load intel-oneapi-compilers/2023.1.0 module load intel-oneapi-mpi/2021.9.0 mpiccmpiicc helloworld_mpi.c -o helloworld_mpi.x |
Next, create a batch script. For example, to use the recommended srun
approachBatch script example useingsrun
:
Code Block | ||
---|---|---|
| ||
#!/bin/bash
#SBATCH --account=racs
#SBATCH --partition=compute
#SBATCH --job-name=intel-mpi
#SBATCH --output=intel-mpi.out
#SBATCH --error=intel-mpi.err
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=28
#SBATCH --ntasks-per-core=1
module load intel-oneapi-compilers/2023.1.0
module load intel-oneapi-mpi/2021.9.0
srun ./helloworld_mpi.x |
MPICH
...
GNU + MPICH
To access Open MPICH MPI compilers,
Code Block |
---|
module load gcc@13gcc/13.1.0 module load mpich/4.1.1 mpicc helloworld_mpi.c -o helloworld_mpich.x |
Code Block |
---|
#!/bin/bash #SBATCH --account=racs #SBATCH --partition=compute,computelong #SBATCH --job-name=mpich-mpi-test #SBATCH --output=mpich-mpi-test.out #SBATCH --error=mpich-mpi-test.err #SBATCH --time=10 #SBATCH --ntasks=500200 #SBATCH --ntasks-per-core=1 #SBATCH --mem-per-cpu=500m #SBATCH --constraint=7713 module load gcc/13.1.0 module load mpich/4.1.1 srun ./helloworld_mpich.x |
...