Service Unit Calculation
Deprecated...
The information below is kept only for historical purposes.
Premise: service units are rooted around the concept that when using the base compute node, 1 CPU = 1 Service Unit. For all compute nodes, the calculation of service units consumed per hour of run time is given by
Service Units = SUM over allocated nodes(max(AllocCPU/TotCPU, AllocRAM/TotRAM, AllocGRES/TotGRES) * NTF) * 28 Service Units/hour * job duration in hours
The idea here is that a job's usage effectively amounts to the largest fraction of resources utilized by the job on a node. For instance, if a job uses all the available cores on a node but little memory then the job is using 100% of the node (i.e. there are no cores available for other jobs). Likewise, if a job is only using one core but requires 100% of the memory on a node, that job is also using 100% of the node (there is insufficient memory for other jobs).
The service unit formula is normalized to give 28 SUs for one hour, corresponding to the number of SUs consumed when using one standard node (28 Broadwell cores) for one hour. However, when using a rarified resource, a multiplicative factor applies. This resource may be a more recent generation of node (e.g. Skylake CPU), a node with specialized hardware (e.g. a GPU), or a node with a particular function (large memory server). The multiplicative factors for node types (NTFs) are based broadly around the cost disparity between these resources and include factors such as core count, core performance, and memory, and may be adjusted over time as part of the core facility rate setting process.
Current Node Type Factors (NTFs) are:
- Standard compute nodes: NTF=1
- GPU equipped nodes: NTF=2
- Large memory (fat) nodes: NTF=6
- Skylake compute nodes: TBD
Note: There are three available memory configurations for fat nodes (1, 2 and 4TB), so to maintain consistency across fat nodes, the TotRAM value for fat nodes is based on the minimum memory configuration of 1 TB (1024GB).
See examples below for applications of this formula:
Example 1 (CPU driven SU):
User A submits a job that is allocated 14 cores and 32 GB of RAM on one standard compute node. Each compute node has a total of 28 cores and 128GB of RAM. The job runs for 10 hours. The job would have consumed
max(14/28, 32/128, 0) * 1 * 28 SU/hr * 10.0hr = (max(0.5, 0.25, 0) * 280 SU = 140 SU
Example 2 (Memory driven SU):
User B submits a job that is allocated 7 cores and 128GB of RAM and one GPU on a GPU node. Each GPU node has a total of 28 cores and 256GB of RAM and 4 GPUs. The job runs for 10 hours. Then the job would have consumed
max(7/28, 128/256,1/4) * 2 * 28 SU/hr * 10.0hr = max(.25, .50, .25) * 560 SU = 280 SU
Example 3 (GPU driven SU):
User C submits a job to the GPU partition and that job is allocated 1 core, 16GB of RAM, and 3 GPUs. The nodes in the GPU partition have 28 cpus, 256 GB of RAM, and 4 GPUs. This job runs for 10 hours and will have consumed
max(1/28, 16/256, 3/4) * 2 * 28 SU/hr * 10.0hr = max(.036, .063, .75) * 560 SU = 420 SU
Example 4 (CPU driven SU on Fat nodes):
User D submits a job to the fat partition that is allocated 42 of the 56 available cpus and 512GB of memory. The job finishes in 10 hours and will have consumed
max(42/56, 512/1024, 0) * 6 * 28 SU/hr * 10.0hr = max(.75, 0.5, 0) * 1680 SU = 1260 SU
Example 5 (Memory driven SU on Fat nodes):
User E submits a job to the fat partition that is allocated 4 of the 56 available cpus and 2TB (2048GB) of memory. The job finishes in 10 hours and will have consumed
max(4/56, 2048/1024, 0) * 6 * 28 SU/hr * 10.0hr = max(.036, 2.0, 0) * 1680 SU = 3360 SU
Example 6 (Multiple standard nodes):
User F submits a job that is allocated 16 standard nodes (28 cores and 128 GB of RAM per node, totaling 448 cores and 2048GB of memory). The job runs for 10 hours and will have consumed
16 nodes * max(28/28, 128/128, 0) * 1 * 28 SU/hr * 10.0hr = 16*(max(1, 1, 0)) * 280 SU = 4480 SU
These examples are subject to change.