Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  1. Create a job script. A job script is a description of the computational resources your job requires and the executables you wish to run. Lets look at a "hello world" example of a job script:


    Code Block
    #SBATCH --partition=long       computelong ### Partition (like a queue in PBS)
    #SBATCH --job-name=HiWorld      ### Job Name
    #SBATCH --output=Hi.out         ### File in which to store job output
    #SBATCH --error=Hi.err          ### File in which to store job error messages
    #SBATCH --time=0-00:01:00       ### Wall clock time limit in Days-HH:MM:SS
    #SBATCH --nodes=1               ### Number of nodes needed for the job
    #SBATCH --ntasks-per-node=1     ### Number of tasks to be launched per Node
    #SBATCH --account=<myPIRG>      ### Account used for job submission
    ./a.out							# run your actual program

    Above we see the contents of our SLURM script (aka job script) called hello.srun.  (The name is arbitrary–use whatever name you like.)  Notice that the script begins with #!/bin/bash. This line tells Linux which shell interpreter to use when executing the script.  Here we used bash (the Bourne Again Shell) and it's by far the most common choice, but other interpreters could be used (e.g., tcsh, python, etc.).  Whatever your choice, every script should begin with interpreter directive.

    Next, we see a collection of specially formatted comments, each beginning with #SBATCH followed by option definitions.  These are used by the sbatch command to set job options.  (As comments, they are ignored by bash.)  This allows us to describe our job to the scheduler and ensure that we reserve the appropriate resources (cores, memory, etc.) for an appropriate amount of time.

    While the specified --time needs to be long enough for the job to complete (lest it be killed when time runs out), it's also good not to needlessly overestimate the amount of time required in the provided --time specification.  Shorter jobs are more likely to run sooner, as they can fill in between longer jobs that aren't yet runnable.

    Note that the script suffix we used is unimportant. You can name your job scripts whatever you wish.

  2. Submit your job to the scheduler using the sbatch command.

    Code Block
    [duckID@login1 helloworld]$ sbatch hello.srun 
    Submitted batch job 20190
    [duckID@login1 helloworld]$

    Our job has been submitted and is assigned the job number 20190 which will serve as its primary identifier.

  3. Check on your job using the squeue command.

    Code Block
    [duckID@login1 helloworld]$ squeue
                 20190 computelo     long  HiWorld   duckID CG       1:09      1 n074
          20123_[1-35]      longcomputelo RSA_09_c    user1 PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:hpc-hn2,ln[1-2],n[005,120,122])
          20017_[5-20]   longgpu pressure    user2 PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:hpc-hn2,ln[1-2],n[005,120,122])
               20017_4   longgpu pressure    user2  R 1-03:53:46      1 n110
               20017_3   longgpu pressure    user2  R 1-06:34:31      1 n109
                 19468   longfat     bash    user3  R 11-21:16:00      1 n123
               20017_2   longgpu pressure    user2  R 1-21:25:54      1 n119
              19995_20   longgpu pressure    user2  R 3-10:28:49      1 n106
              19995_19   longgpu pressure    user2  R 3-19:37:26      1 n104
               19995_3   longgpu pressure    user2  R 4-05:01:45      1 n111
               19995_4   longgpu pressure    user2  R 4-05:01:45      1 n112
               19995_5   longgpu pressure    user2  R 4-05:01:45      1 n113
              19995_11   longgpu pressure    user2  R 4-05:01:45      1 n100
               20017_0   longgpu pressure    user2  R 2-03:41:35      1 n107
               20017_1   longgpu pressure    user2  R 2-03:41:35      1 n118
                 20189       fat build-R-    user4  R       7:42      1 n121
                 20188       fat build-R-    user4  R      13:00      1 n124
                 20177      defq make-sil    user4  R    1:43:44     72 n[006-073,075-078]
    [duckID@ln1 helloworld]$

    Here we see that our job, number 20190, is in the CG (completing state). Notice that other jobs in the system are in the R (running) or PD (pending state). Jobs are pending when there are insufficient resources available to accommodate the request as specified in the job script. In this case, the system was scheduled for maintenance, and the wall clock limit specified in those jobs would have allowed them to run into the maintenance period. The jobs will run once the maintenance is complete, and the reservation is removed from the system. To view only your jobs, use the option flag -u followed by your userID, e.g. squeue -u duckID

  4. If necessary, cancel your job using the scancel command followed by the job number of the job you wish to cancel.

    Code Block
    [duckID@login1 helloworld]$ scancel 20190
    [duckIDo@login1 helloworld]$ squeue -u cmaggio
    [duckID@login1 helloworld]$