Clipper is equipped with 22 NVIDIA GPUs for research use. This article details how to request GPUs in Slurm jobs.
Available GPUs
| g[001-004] |
2 |
NVIDIA Tesla V100s |
7.0 |
tesla_v100s |
24 |
| g[005-008] |
2 |
NVIDIA Quadro RTX 8000 |
7.5 |
quadro_rtx_8000 |
40 |
| g[050-052] |
2 |
NVIDIA H100 NVL |
9.0 |
nvidia_h100_nvl |
N/A |
* NVIDIA’s CUDA compute capability indicates the features and generation of a specific GPU. See: https://developer.nvidia.com/cuda-gpus
What option is right for my research?
The NVIDIA H100 NVL is one of the latest generations of NVIDIA cards and excels at all workloads. It will be the fastest of all options available in Clipper, regardless of task. It is rated to be about 3.5 times faster than the Tesla V100s in double-precision calculations. The H100 NVL has 94 GB of high-speed HBM3 memory versus the V100s' 32 GB of HBM2 memory.
The Quadro cards lack double-precision performance, maxing out at around 500 GFLOPS (billion FLOPS) for double-precision calculations. That’s about 16 times slower than the Tesla V100s.
Single-precision performance between the v100s and RTX 8000 is similar.
Single vs. Double Precision
Single-precision calculations use 32 bits to represent a number. Double-precision calculations use 64 bits to represent a number.
Single-precision offers acceptable accuracy for tasks like graphics or machine learning, where some error tolerance is possible. Double-precision calculations provide a wider range of numbers and higher precision, crucial for scientific simulations where tiny differences matter.
Requesting GPUs in a Slurm Job
GPUs are not available in a Slurm job without explicitly requesting them.
Requesting Whole GPUs
To request one or more GPUs per node in a Slurm job, use the following in your submission:
--gpus-per-node=[type:]number
For example, to request a single GPU:
--gpus-per-node=1
The above will allocate any free GPU to a job. To request a specific GPU type, you must use the type specifier from the table above:
--gpus-per-node=tesla_v100s:1
--gpus-per-node=quadro_rtx_8000:1
Requesting GPU Shards
A feature of Slurm is GPU sharding. Sharding allows you to share a single GPU among multiple jobs. The v100s and Quadro cards can be sharded.
Please be aware Slurm doesn’t actively monitor or enforce GPU usage within shards. Jobs need to be well-behaved and respect their allocated shard memory. GPUs also cannot be simulatenously used as both a shard and gpu Slurm resource.
To request GPU shard (in this case, 12 shards):
--gres=shard:12
As with requesting whole GPUs, you can use the type specifier from the table above to request shards from a specific GPU type:
--gres=shard:tesla_v100s:12
--gres=shard:quadro_rtx_8000:12
NVIDIA Multi-Process Service
NVIDIA Multi-Process Service is enabled as a generic resource. MPS, like sharding, allows sharing GPU resources among multiple jobs. The v100s and Quadro cards can utilize MPS.
Only one user on a system may use MPS. GVSU ARC recommends using sharding instead of MPS.
Tracking GPU Usage
NVIDIA Data Center GPU Manager monitors the health and usage of GPUs in the Clipper cluster. Slurm collects and saves GPU statistics alongside the job output in a separate text file.
More Information