Clipper is equipped with 16 NVIDIA GPUs for research use. This article details how to request GPUs in Slurm jobs.
Available GPUs
g[001-004] |
2 |
NVIDIA Tesla V100s |
7.0 |
tesla_v100s |
24 |
g[005-008] |
2 |
NVIDIA Quadro RTX 8000 |
7.5 |
quadro_rtx_8000 |
40 |
* NVIDIA’s CUDA compute capability indicates the features and generation of a specific GPU. See: https://developer.nvidia.com/cuda-gpus
What option is right for my research?
The biggest difference between the two NVIDIA cards is their double-precision performance. The Tesla V100s units are much faster for tasks requiring high accuracy, reaching peak speeds over 8 TFLOPS (trillions of floating-point operations per second). In contrast, the Quadro RTX 8000 units max out at around 500 GFLOPS (billion FLOPS) for double-precision calculations. That’s about 16 times slower than the Tesla V100s.
Single-precision performance between the two cards is similar.
Single vs. Double Precision
Single-precision calculations use 32 bits to represent a number. Double-precision calculations use 64 bits to represent a number.
Single-precision offers acceptable accuracy for tasks like graphics or machine learning, where some error tolerance is possible. Double-precision calculations provide a wider range of numbers and higher precision, crucial for scientific simulations where tiny differences matter.
Requesting GPUs in a Slurm Job
GPUs are not available in a Slurm job without explicitly requesting them.
Requesting Whole GPUs
To request one or more GPUs per node in a Slurm job, use the following in your submission:
--gpus-per-node=[type:]number
For example, to request a single GPU:
--gpus-per-node=1
The above will allocate any free GPU to a job. To request a specific GPU type, you must use the type specifier from the table above:
--gpus-per-node=tesla_v100s:1
--gpus-per-node=quadro_rtx_8000:1
Requesting GPU Shards
A feature of Slurm is GPU sharding. Sharding allows you to share a single GPU among multiple jobs.
Please be aware Slurm doesn’t actively monitor or enforce GPU usage within shards. Jobs need to be well-behaved and respect their allocated shard memory. GPUs also cannot be simulatenously used as both a shard and gpu Slurm resource.
To request GPU shard (in this case, 12 shards):
--gres=shard:12
As with requesting whole GPUs, you can use the type specifier from the table above to request shards from a specific GPU type:
--gres=shard:tesla_v100s:12
--gres=shard:quadro_rtx_8000:12
NVIDIA Multi-Process Service
NVIDIA Multi-Process Service is enabled as a generic resource. MPS, like sharding, allows sharing GPU resources among multiple jobs.
Only one user on a system may use MPS. GVSU ARC recommends using sharding instead of MPS.
Tracking GPU Usage
NVIDIA Data Center GPU Manager monitors the health and usage of GPUs in the Clipper cluster. Slurm collects and saves GPU statistics alongside the job output in a separate text file.
More Information