Page Comparison

...

Create a job script. A job script is a description of the computational resources your job requires and the executables you wish to run. Lets look at a "hello world" example of a job script:

hello.srun

Code Block

language	bash

#!/bin/bash
#SBATCH --partition=long       computelong ### Partition (like a queue in PBS)
#SBATCH --job-name=HiWorld      ### Job Name
#SBATCH --output=Hi.out         ### File in which to store job output
#SBATCH --error=Hi.err          ### File in which to store job error messages
#SBATCH --time=0-00:01:00       ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1               ### Number of nodes needed for the job
#SBATCH --ntasks-per-node=1     ### Number of tasks to be launched per Node
#SBATCH --account=<myPIRG>      ### Account used for job submission

./a.out							# run your actual program

Above we see the contents of our SLURM script (aka job script) called hello.srun. (The name is arbitrary–use whatever name you like.) Notice that the script begins with #!/bin/bash. This line tells Linux which shell interpreter to use when executing the script. Here we used bash (the Bourne Again Shell) and it's by far the most common choice, but other interpreters could be used (e.g., tcsh, python, etc.). Whatever your choice, every script should begin with interpreter directive.

Next, we see a collection of specially formatted comments, each beginning with #SBATCH followed by option definitions. These are used by the sbatch command to set job options. (As comments, they are ignored by bash.) This allows us to describe our job to the scheduler and ensure that we reserve the appropriate resources (cores, memory, etc.) for an appropriate amount of time.

While the specified --time needs to be long enough for the job to complete (lest it be killed when time runs out), it's also good not to needlessly overestimate the amount of time required in the provided --time specification. Shorter jobs are more likely to run sooner, as they can fill in between longer jobs that aren't yet runnable.

Note that the script suffix we used is unimportant. You can name your job scripts whatever you wish.

Submit your job to the scheduler using the sbatch command.
Code Block
language text
[duckID@login1 helloworld]$ sbatch hello.srun Submitted batch job 20190 [duckID@login1 helloworld]$
Our job has been submitted and is assigned the job number 20190 which will serve as its primary identifier.

Check on your job using the squeue command.

Code Block

language	text

[duckID@login1 helloworld]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             20190 computelo     long  HiWorld   duckID CG       1:09      1 n074
      20123_[1-35]      longcomputelo RSA_09_c    user1 PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:hpc-hn2,ln[1-2],n[005,120,122])
      20017_[5-20]   longgpu pressure    user2 PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:hpc-hn2,ln[1-2],n[005,120,122])
           20017_4   longgpu pressure    user2  R 1-03:53:46      1 n110
           20017_3   longgpu pressure    user2  R 1-06:34:31      1 n109
             19468   longfat     bash    user3  R 11-21:16:00      1 n123
           20017_2   longgpu pressure    user2  R 1-21:25:54      1 n119
          19995_20   longgpu pressure    user2  R 3-10:28:49      1 n106
          19995_19   longgpu pressure    user2  R 3-19:37:26      1 n104
           19995_3   longgpu pressure    user2  R 4-05:01:45      1 n111
           19995_4   longgpu pressure    user2  R 4-05:01:45      1 n112
           19995_5   longgpu pressure    user2  R 4-05:01:45      1 n113
          19995_11   longgpu pressure    user2  R 4-05:01:45      1 n100
           20017_0   longgpu pressure    user2  R 2-03:41:35      1 n107
           20017_1   longgpu pressure    user2  R 2-03:41:35      1 n118
             20189       fat build-R-    user4  R       7:42      1 n121
             20188       fat build-R-    user4  R      13:00      1 n124
             20177      defq make-sil    user4  R    1:43:44     72 n[006-073,075-078]
[duckID@ln1 helloworld]$

Here we see that our job, number 20190, is in the CG (completing state). Notice that other jobs in the system are in the R (running) or PD (pending state). Jobs are pending when there are insufficient resources available to accommodate the request as specified in the job script. In this case, the system was scheduled for maintenance, and the wall clock limit specified in those jobs would have allowed them to run into the maintenance period. The jobs will run once the maintenance is complete, and the reservation is removed from the system. To view only your jobs, use the option flag -u followed by your userID, e.g. squeue -u duckID.

If necessary, cancel your job using the scancel command followed by the job number of the job you wish to cancel.

Code Block

language	text

[duckID@login1 helloworld]$ scancel 20190
[duckIDo@login1 helloworld]$ squeue -u cmaggio
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[duckID@login1 helloworld]$

...

Versions Compared

Old Version 4

New Version 5

Key