Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: mv ReqNodeNotAvail to its own page, rename

...

5. "Why is my job pending with 'ReqNodeNotAvail' message when I type squeue?"

The most likely explanation is because your job cannot run yet because it will overlap with an existing reservation, e.g. a reservation made for a maintenance outage.  If this is the case, and if you know that your job can complete before the outage window, you can change the TimeLimit of your queued job as follows:

scontrol update jobid=1234567 TimeLimit=2-12:00:00

This will change the TimeLimit of jobid to 2 days and 12 hours.  To resubmit jobs with non-default time limits, use the "--time" option.  For example, to submit a job to the long queue but for only 4 days rather than the default 14 days, add this SBATCH directive to your batch script:

#SBATCH --time=4-00:00:00

If the maintenance window is scheduled for the next day and you want an interactive job on the short queue for just six hours rather than the default 24 hours, try this:

srun -p short --time=0-06:00:00 --pty bash -i

Efforts will be made to communicate maintenance outages 30 days, 14 days, and 1 day before the outage begins.  In addition, the current maintenance schedule for Talapas is published here:

Scheduled Maintenance Windows

If you get this "ReqNodeNotAvail" message and there is no maintenance scheduled, please leave the job in the queue, submit a ticket and we will investigate.See FAQ: Why is my SLURM job not running?