Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
stylenone

General Principles

Jobs that run on multiple nodes generally use a parallel programming API called MPI (Message Passing Interface), which allows processes on multiple nodes to communicate with high throughput and low latency (especially over Talapas' InfiniBand network).  MPI is a standard and has multiple implementations—several are available on Talapas, notably Intel MPI and MPICH.

...

Code Block
languagetext
#SBATCH --partition=compute
#SBATCH --ntasks=500
#SBATCH --ntasks-per-core=1
#SBATCH --mem-per-cpu=500m
#SBATCH --constraint=7713

With this approach is that the job will probably be scheduled sooner , since Slurm is free to use any available cores, rather than having to arrange for nodes with sufficient free cores to become available.  It’s . It’s recommended to keep the job tied to cores of the same type through use of the Slurm’s --constraint flag.

Another method is to specify the number nodes you want the job to run on and how many tasks to run on each node.

...

See sinfo -o "%10R %8D %25N %8c %10m %40f %35G"for a complete list of nodes properties and features relative to the their partitions.

...

Also see /packages/racs/bin/slurm-show-features

Memory

For single-node jobs use the Slurm --mem flag to specify the entire amount of memory to allocate to the job will be allocated

For multi-node jobs use the Slurm --mem-per-cpu flag to specify the amount of memory available to allocate to each individual task.

Code Block
languagetext
#SBATCH --mem-per-cpu=8G

...

Slurm Invocation

Slurm provides two slightly different ways to invoke your MPI program. 

...

See the Slurm MPI guide for more information.

MPI compilers

Intel

...

To access the Intel OneAPI MPI compilers, such as, mpicc or mpiifort:

Code Block
languagetext
module load intel-oneapi-compilers/2023.1.0
module load intel-oneapi-mpi/2021.9.0
mpiccmpiicc helloworld_mpi.c -o helloworld_mpi.x

Next, create a batch script. For example, to use the recommended srun approachBatch script example useingsrun:

Code Block
languagebash
#!/bin/bash
#SBATCH --account=racs
#SBATCH --partition=compute
#SBATCH --job-name=intel-mpi
#SBATCH --output=intel-mpi.out
#SBATCH --error=intel-mpi.err
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=28
#SBATCH --ntasks-per-core=1
module load intel-oneapi-compilers/2023.1.0
module load intel-oneapi-mpi/2021.9.0
srun ./helloworld_mpi.x

MPICH

...

GNU + MPICH

To access Open MPICH MPI compilers,

Code Block
module load gcc@13gcc/13.1.0
module load mpich/4.1.1
mpicc helloworld_mpi.c -o helloworld_mpich.x
Code Block
#!/bin/bash
#SBATCH --account=racs
#SBATCH --partition=compute,computelong
#SBATCH --job-name=mpich-mpi-test
#SBATCH --output=mpich-mpi-test.out
#SBATCH --error=mpich-mpi-test.err
#SBATCH --time=10
#SBATCH --ntasks=500200
#SBATCH --ntasks-per-core=1
#SBATCH --mem-per-cpu=500m
#SBATCH --constraint=7713

module load gcc/13.1.0 
module load mpich/4.1.1

srun ./helloworld_mpich.x

...