Using TensorFlow on Clipper

Clipper TensorFlow

TensorFlow is a free and open-source Python software library for machine learning and artificial intelligence.

This article describes how to use TensorFlow on the Clipper HPC cluster, including how to utilize the NVIDIA GPUs available on select nodes via Python. An example using the R statistical computing language is also provided.

Loading TensorFlow Virtual Environment

Advanced Research Computing support has setup a globally-available Python virtual environment containing TensorFlow and Keras called py-tensorflow. It can be loaded using the Lmod module system command module load. The module must be loaded to utilize TensorFlow, either manually or in a Slurm batch job.

[hpcuser1@clipper ~]$ module load py-tensorflow
Loading py-tensorflow/2.14.0_py39
  Loading requirement: cuda12.0/toolkit/12.0.1

The module loads a specifically-configured Python installation and updates your PATH:

[hpcuser1@clipper ~]$ which python3
/cm/shared/venv/py-tensorflow-2_14_0-python3_9/bin/python3

As of November 7, 2023, the environment contains the following Python packages:

[hpcuser1@clipper ~]$ pip list

[hpcuser1@clipper ~]$ pip list

Package                      Version
---------------------------- ------------
absl-py                      1.4.0
array-record                 0.5.0
astunparse                   1.6.3
cachetools                   5.3.2
certifi                      2023.7.22
charset-normalizer           3.3.2
click                        8.1.7
dm-tree                      0.1.8
etils                        1.5.2
flatbuffers                  23.5.26
fsspec                       2023.10.0
gast                         0.5.4
google-auth                  2.23.4
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
googleapis-common-protos     1.61.0
grpcio                       1.59.2
h5py                         3.10.0
idna                         3.4
importlib-metadata           6.8.0
importlib-resources          6.1.0
keras                        2.14.0
libclang                     16.0.6
Markdown                     3.5.1
MarkupSafe                   2.1.3
ml-dtypes                    0.2.0
numpy                        1.26.1
nvidia-cublas-cu11           11.11.3.6
nvidia-cuda-cupti-cu11       11.8.87
nvidia-cuda-nvcc-cu11        11.8.89
nvidia-cuda-runtime-cu11     11.8.89
nvidia-cudnn-cu11            8.7.0.84
nvidia-cufft-cu11            10.9.0.58
nvidia-curand-cu11           10.3.0.86
nvidia-cusolver-cu11         11.4.1.48
nvidia-cusparse-cu11         11.7.5.86
nvidia-nccl-cu11             2.16.5
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.2
pandas                       2.1.2
Pillow                       10.1.0
pip                          23.3.1
promise                      2.3
protobuf                     3.20.3
psutil                       5.9.6
pyarrow                      14.0.0
pyasn1                       0.5.0
pyasn1-modules               0.3.0
pydot                        1.4.2
pyparsing                    3.1.1
python-dateutil              2.8.2
pytz                         2023.3.post1
requests                     2.31.0
requests-oauthlib            1.3.1
rsa                          4.9
scipy                        1.11.3
setuptools                   53.0.0
six                          1.16.0
tensorboard                  2.14.1
tensorboard-data-server      0.7.2
tensorflow                   2.14.0
tensorflow-datasets          4.9.3
tensorflow-estimator         2.14.0
tensorflow-hub               0.15.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-metadata          1.14.0
tensorrt                     8.5.3.1
termcolor                    2.3.0
toml                         0.10.2
tqdm                         4.66.1
typing_extensions            4.8.0
tzdata                       2023.3
urllib3                      2.0.7
Werkzeug                     3.0.1
wheel                        0.41.3
wrapt                        1.14.1
zipp                         3.17.0

Running TensorFlow from Python with GPU using Slurm Job

This example uses code from Google’s Colab project.

Create a Python file named cpu-vs-gpu.py containing the following code in your /home directory:

cpu-vs.gpu.py

import tensorflow as tf
import timeit

device_name = tf.test.gpu_device_name()

if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

def cpu():
  with tf.device('/cpu:0'):
    random_image_cpu = tf.random.normal((100, 100, 100, 3))
    net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
    return tf.math.reduce_sum(net_cpu)

def gpu():
  with tf.device('/device:GPU:0'):
    random_image_gpu = tf.random.normal((100, 100, 100, 3))
    net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
    return tf.math.reduce_sum(net_gpu)

# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

# Run the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

Create a Slurm submission script named slurm-tensorflow-python.sh containing the following code in your /home directory:

#!/bin/bash

# Please be aware this is an extremely basic Slurm script intended to target a single GPU-enabled node.

#SBATCH --nodes=1
#SBATCH --partition=gpu
#SBATCH --ntasks=1
#SBATCH --job-name=tensorflow-test
#SBATCH --output=test-job.%j.out

module load py-tensorflow

# This references the relative path of your home folder and the cpu-vs-gpu.py script
python ~/cpu-vs-gpu.py

Submit the test TensorFlow job to Slurm using sbatch:

[hpcuser1@clipper ~]$ sbatch slurm-tensorflow-python.sh
Submitted batch job 475

Results will be located in your /home folder in a file name test-job.<JOBID>.out.

... removed for brevity

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
0.16155333584174514
GPU (s):
0.03590483497828245
GPU speedup over CPU: 4x

Running TensorFlow from R using Slurm Job

This example uses code from TensorFlow for R’s quickstart for beginners.

Create a R script named tensorflow-quickstart.R containing the following code in your /home directory:

tensorflow-quickstart.R

library(tensorflow)
library(keras)

c(c(x_train, y_train), c(x_test, y_test)) %<-% keras::dataset_mnist()
x_train <- x_train / 255
x_test <-  x_test / 255

model <- keras_model_sequential(input_shape = c(28, 28)) %>%
  layer_flatten() %>%
  layer_dense(128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10)

predictions <- predict(model, x_train[1:2, , ])

predictions

tf$nn$softmax(predictions)

loss_fn <- loss_sparse_categorical_crossentropy(from_logits = TRUE)

loss_fn(y_train[1:2], predictions)

model %>% compile(
  optimizer = "adam",
  loss = loss_fn,
  metrics = "accuracy"
)

model %>% fit(x_train, y_train, epochs = 5)

model %>% evaluate(x_test,  y_test, verbose = 2)

Create a Slurm submission script named slurm-tensorflow-R.sh containing the following code in your /home directory:

#!/bin/bash

# Please be aware this is an extremely basic Slurm script intended to target a single node on the debug queue.

#SBATCH --nodes=1
#SBATCH --partition=debug
#SBATCH --ntasks=1
#SBATCH --job-name=tensorflow-test
#SBATCH --output=test-job.%j.out

module load py-tensorflow

# This references the relative path of your home folder and the tensorflow-quickstart.R script
Rscript ~/tensorflow-quickstart.R

Submit the test TensorFlow job to Slurm using sbatch:

[hpcuser1@clipper ~]$ sbatch slurm-tensorflow-R.sh
Submitted batch job 479

Results will be located in your /home folder in a file name test-job.<JOBID>.out.

... removed for brevity

Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2973 - accuracy: 0.9139
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1436 - accuracy: 0.9575
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1076 - accuracy: 0.9670
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0874 - accuracy: 0.9729
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0730 - accuracy: 0.9771

313/313 - 0s - loss: 0.0713 - accuracy: 0.9793 - 404ms/epoch - 1ms/step
      loss   accuracy
0.07126244 0.97930002

Additional Information

There is currently a bug in TensorFlow 2.14 which results in the following lines being present in the job log:

2023-11-07 11:00:11.236651: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 11:00:11.236683: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 11:00:11.236705: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

If a TensorFlow job runs on a node without a GPU, a warning about failing to find a CUDA-capable device will be present in the job log. TensorFlow falls back to using the CPU in this case.

2023-11-07 11:19:20.434177: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Was this helpful?

0 reviews

Print Article

Details

Article ID: 17170

Created

Tue 11/7/23 11:32 AM

Modified

Tue 11/7/23 12:07 PM

Updating...