Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Changed mentions of partition "short" to "compute" and "long" to "computelong" to reflect partition name changes from T1 to T2.

...

Code Block
languagebash
titlehostname.batch
#!/bin/bash
#SBATCH --account=<myaccount>   ### change this to your actual account for charging
#SBATCH --partition=short compute      ### queue to submit to
#SBATCH --job-name=myhostjob    ### job name
#SBATCH --output=hostname.out   ### file in which to store job stdout
#SBATCH --error=hostname.err    ### file in which to store job stderr
#SBATCH --time=5                ### wall-clock time limit, in minutes
#SBATCH --mem=100M              ### memory limit per node, in MB
#SBATCH --nodes=1               ### number of nodes to use
#SBATCH --ntasks-per-node=1     ### number of tasks to launch per node
#SBATCH --cpus-per-task=1       ### number of cores for each task

hostname

...

Code Block
languagebash
$ sbatch hostname.batch
Submitted batch job 12345

# checking with squeue, we see that the job has started running (state is 'R')
$ squeue -j 12345
   JOBID PARTITION     NAME        USER ST       TIME  NODES NODELIST(REASON)
    12345   shortcompute     myhostjob   joec  R       5:08      1 n003

# later, we see that the job has successfully completed
$ sacct -j 12345
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
12345       myhostjob+      shortcompute    joegrp         1     COMPLETED      0:0
12345.batch      batch               joegrp         1     COMPLETED      0:0
12345.exte+     extern               joegrp         1     COMPLETED      0:0

...

SLURM uses a number of queues, which it calls partitions, to run jobs with different properties.  Normal jobs will use the short compute partition, but there are other partitions for jobs that need to run longer than a day, need more memory, or need to use GPUs, for example.  You can use the sinfo command to see detailed information about the partitions and their states.

...

Code Block
languagetext
srun --account=hpcrcf --pty --partition=shortcompute --mem=1024M --time=240 bash

Most of the flags work just as in batch scripts, above.  This command requests a single core (the default) on a node in the short compute partition, with 1024MB of memory, for four hours.  (After four hours, the shell and all commands running under it will be killed.)

...

Code Block
languagetext
srun --account=hpcrcf --x11 --pty --partition=shortcompute --mem=4G --time=4:00 bash

...

Code Block
languagetext
xrun --account=hpcrcf --partition=shortcompute --mem=1024 --time=4 bash

Note that this script supports only a subset of the srun flags and has some quirks.  The --mem value must be a whole number and is in megabytes.  The --time value is in HH:MM format, but if only a whole number is specified, it is take as hours (rather than minutes).

...

For time, the default varies by partition, but is generally the maximum available for the partition.  For the short compute partition, the default is currently 24 hours.  For the long computelong partition, it's currently two weeks.  If your job will take significantly less time, you can specify a shorter duration, to increase the odds that it will be scheduled sooner.

...