TensorFlow is a free and open-source Python software library for machine learning and artificial intelligence.
This article describes how to use TensorFlow on the Clipper HPC cluster, including how to utilize the NVIDIA GPUs available on select nodes via Python. An example using the R statistical computing language is also provided.
Loading TensorFlow Virtual Environment
Advanced Research Computing support has setup a globally-available Python virtual environment containing TensorFlow and Keras called py-tensorflow
. It can be loaded using the Lmod module system command module load
. The module must be loaded to utilize TensorFlow, either manually or in a Slurm batch job.
[hpcuser1@clipper ~]$ module load py-tensorflow
Loading py-tensorflow/2.14.0_py39
Loading requirement: cuda12.0/toolkit/12.0.1
The module loads a specifically-configured Python installation and updates your PATH:
[hpcuser1@clipper ~]$ which python3
/cm/shared/venv/py-tensorflow-2_14_0-python3_9/bin/python3
As of November 7, 2023, the environment contains the following Python packages:
[hpcuser1@clipper ~]$ pip list
Package Version
---------------------------- ------------
absl-py 1.4.0
array-record 0.5.0
astunparse 1.6.3
cachetools 5.3.2
certifi 2023.7.22
charset-normalizer 3.3.2
click 8.1.7
dm-tree 0.1.8
etils 1.5.2
flatbuffers 23.5.26
fsspec 2023.10.0
gast 0.5.4
google-auth 2.23.4
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
googleapis-common-protos 1.61.0
grpcio 1.59.2
h5py 3.10.0
idna 3.4
importlib-metadata 6.8.0
importlib-resources 6.1.0
keras 2.14.0
libclang 16.0.6
Markdown 3.5.1
MarkupSafe 2.1.3
ml-dtypes 0.2.0
numpy 1.26.1
nvidia-cublas-cu11 11.11.3.6
nvidia-cuda-cupti-cu11 11.8.87
nvidia-cuda-nvcc-cu11 11.8.89
nvidia-cuda-runtime-cu11 11.8.89
nvidia-cudnn-cu11 8.7.0.84
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.3.0.86
nvidia-cusolver-cu11 11.4.1.48
nvidia-cusparse-cu11 11.7.5.86
nvidia-nccl-cu11 2.16.5
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 23.2
pandas 2.1.2
Pillow 10.1.0
pip 23.3.1
promise 2.3
protobuf 3.20.3
psutil 5.9.6
pyarrow 14.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydot 1.4.2
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2023.3.post1
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
scipy 1.11.3
setuptools 53.0.0
six 1.16.0
tensorboard 2.14.1
tensorboard-data-server 0.7.2
tensorflow 2.14.0
tensorflow-datasets 4.9.3
tensorflow-estimator 2.14.0
tensorflow-hub 0.15.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-metadata 1.14.0
tensorrt 8.5.3.1
termcolor 2.3.0
toml 0.10.2
tqdm 4.66.1
typing_extensions 4.8.0
tzdata 2023.3
urllib3 2.0.7
Werkzeug 3.0.1
wheel 0.41.3
wrapt 1.14.1
zipp 3.17.0
Running TensorFlow from Python with GPU using Slurm Job
Create a Python file named cpu-vs-gpu.py
containing the following code in your /home
directory:
import tensorflow as tf
import timeit
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
print(
'\n\nThis error most likely means that this notebook is not '
'configured to use a GPU. Change this in Notebook Settings via the '
'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
raise SystemError('GPU device not found')
def cpu():
with tf.device('/cpu:0'):
random_image_cpu = tf.random.normal((100, 100, 100, 3))
net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
return tf.math.reduce_sum(net_cpu)
def gpu():
with tf.device('/device:GPU:0'):
random_image_gpu = tf.random.normal((100, 100, 100, 3))
net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
return tf.math.reduce_sum(net_gpu)
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()
# Run the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
'(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))
Create a Slurm submission script named slurm-tensorflow-python.sh
containing the following code in your /home
directory:
#!/bin/bash
# Please be aware this is an extremely basic Slurm script intended to target a single GPU-enabled node.
#SBATCH --nodes=1
#SBATCH --partition=gpu
#SBATCH --ntasks=1
#SBATCH --job-name=tensorflow-test
#SBATCH --output=test-job.%j.out
module load py-tensorflow
# This references the relative path of your home folder and the cpu-vs-gpu.py script
python ~/cpu-vs-gpu.py
Submit the test TensorFlow job to Slurm using sbatch
:
[hpcuser1@clipper ~]$ sbatch slurm-tensorflow-python.sh
Submitted batch job 475
Results will be located in your /home
folder in a file name test-job.<JOBID>.out
.
... removed for brevity
Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
0.16155333584174514
GPU (s):
0.03590483497828245
GPU speedup over CPU: 4x
Running TensorFlow from R using Slurm Job
Create a R script named tensorflow-quickstart.R
containing the following code in your /home
directory:
library(tensorflow)
library(keras)
c(c(x_train, y_train), c(x_test, y_test)) %<-% keras::dataset_mnist()
x_train <- x_train / 255
x_test <- x_test / 255
model <- keras_model_sequential(input_shape = c(28, 28)) %>%
layer_flatten() %>%
layer_dense(128, activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(10)
predictions <- predict(model, x_train[1:2, , ])
predictions
tf$nn$softmax(predictions)
loss_fn <- loss_sparse_categorical_crossentropy(from_logits = TRUE)
loss_fn(y_train[1:2], predictions)
model %>% compile(
optimizer = "adam",
loss = loss_fn,
metrics = "accuracy"
)
model %>% fit(x_train, y_train, epochs = 5)
model %>% evaluate(x_test, y_test, verbose = 2)
Create a Slurm submission script named slurm-tensorflow-R.sh
containing the following code in your /home
directory:
#!/bin/bash
# Please be aware this is an extremely basic Slurm script intended to target a single node on the debug queue.
#SBATCH --nodes=1
#SBATCH --partition=debug
#SBATCH --ntasks=1
#SBATCH --job-name=tensorflow-test
#SBATCH --output=test-job.%j.out
module load py-tensorflow
# This references the relative path of your home folder and the tensorflow-quickstart.R script
Rscript ~/tensorflow-quickstart.R
Submit the test TensorFlow job to Slurm using sbatch
:
[hpcuser1@clipper ~]$ sbatch slurm-tensorflow-R.sh
Submitted batch job 479
Results will be located in your /home
folder in a file name test-job.<JOBID>.out
.
... removed for brevity
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2973 - accuracy: 0.9139
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1436 - accuracy: 0.9575
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1076 - accuracy: 0.9670
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0874 - accuracy: 0.9729
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0730 - accuracy: 0.9771
313/313 - 0s - loss: 0.0713 - accuracy: 0.9793 - 404ms/epoch - 1ms/step
loss accuracy
0.07126244 0.97930002
Additional Information
There is currently a bug in TensorFlow 2.14 which results in the following lines being present in the job log:
2023-11-07 11:00:11.236651: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 11:00:11.236683: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 11:00:11.236705: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
If a TensorFlow job runs on a node without a GPU, a warning about failing to find a CUDA-capable device will be present in the job log. TensorFlow falls back to using the CPU in this case.
2023-11-07 11:19:20.434177: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected