Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add strategy discussion

Talapas has a special SLURM partition (queue) named preempt that provides low-priority access to almost every compute node in the cluster–even compute nodes that you normally wouldn't have permission to use (e.g., because they're condo nodes owned by a lab).  If you're able to use this partition for your job, it might be scheduled for execution much far sooner than it would be on one of the ordinary queues.

...

An additional limitation is that jobs in preempt are currently limited to seven days and may use at most eight compute nodes.

Submission

Submitting a job to the preempt partition is mostly largely the same as submitting it to any other queue.  As above, if you don't want your job to be automatically requeued, you'll need to include this option

...

The default time limit for submitted jobs is seven days and the default memory is about 4200MB (same as the short partition).  However, because all of the compute nodes are available for scheduling, you can request any combination of resources that can be satisfied by any of our compute nodes.  So, for example, you could request 800GB of memory–this would result in the job being run on one of our "fat" nodes, since only those nodes have that much memory.  Similarly, you could request one or more GPUs, which would cause the job to be scheduled only on a node that had GPUs.  As always, the fewer resources you request, the sooner your job is likely to run.

Strategy

Because preempt jobs will be killed as needed, it's worth thinking about how to reduce the probability that this will happen to your job.  Using a smaller CPU core count might help.  Intuitively, this makes it less likely that the CPUs your job is using will get "unlucky".  This somewhat depends on how core count affects your job's run time.  If the choice is between a job that uses two CPUs for eight hours or eight CPUs for two hours, the chances of being preempted might be quite similar.  Usually jobs aren't perfectly parallelized, though, so reducing CPU count could be a win.

Multi-node MPI jobs are a distinct case.  There is often a choice between using whole nodes versus allowing SLURM to place them wherever is expedient.  Although the latter option is attractive for other reasons, empirically, it seems to increase the chances of preemption.  As a conjecture, this might be because some tasks are placed on the popular club partitions (e.g., 'short').  Because small jobs are submitted to these partitions frequently, there is a higher chance that one will collide with the preempt job.  And the loss of even a single task (CPU) usually results in a crash of the entire MPI job.

See How-to Submit a MPI Job for additional information.

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@143c1
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ( "slurm" , "preempt" ) and type = "page" and space = "TCP"
labelspreempt slurm

...