Using Ollama on Clipper

The Slurm and Python scripts referenced in this article are available on the GVSU ARC GitHub.

From GitHub:

Ollama is a lightweight, extensible framework for building and running language models on the local machine.

It is possible to run Ollama on Clipper.

Installing Ollama

Official Ollama binaries with NVIDIA CUDA support are available from GitHub.

The Ollama binary and required libraries are ~3 GB in size. Models can be much larger. GVSU ARC recommends locating the Ollama installation, and all associated code or virtual environments, in a single directory inside your project allocation.

Create a folder in your project directory and download/untar the Linux/amd64 Ollama release (the example below shows v0.12.9, the latest as of this writing):

# create the holding directory
mkdir -p /mnt/projects/hpcuser1_project/ollama

# enter the newly-created directory
cd /mnt/projects/hpcuser1_project/ollama

# download ollama release
wget https://github.com/ollama/ollama/releases/download/v0.12.9/ollama-linux-amd64.tgz

# untar the relese file
tar -xf ollama-linux-amd64.tgz

The Ollama binary is now available at /mnt/projects/hpcuser1_project/ollama/bin/ollama.

Running Ollama Server via Slurm Job

Ollama is capable of running without a GPU but performance will be severely impacted. A GPU is recommended.

The three GPU models available in Clipper have different amounts of memory. It is important to select an Ollama model which can fit in the memory of the GPU requested/selected by your job.

  • Telsa V100 GPUs (g001-g004) - 32 GB
  • Quadro RTX8000 GPUs (g005-g008) - 48 GB
  • H100 NVL GPUs (g050-g052) - 94 GB

Create a Slurm script, for example ollama-server.sbatch. The example script below will serve Ollama on a randomized port on any available single GPU:

#!/bin/bash
#SBATCH --job-name=ollama-server
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8 # number of cpus to use
#SBATCH --mem=32G # amount of ram to request
#SBATCH --gpus-per-node=1 # request a gpu of any type
#SBATCH --time=5-00:00:00 # time to run, max is five days shown here
#SBATCH --output=ollama-server.out # standard error/out location
#SBATCH --dependency=singleton # two jobs with same name cannot run (don't run two instances of the same ollama server)
#SBATCH --signal=B:SIGINT@120 # two minutes before job ends, send the equivalent of Ctrl-C to ollama so it can cleanly shutdown

# Set the base location for your Ollama installation/project
export OLLAMA_DIR=/mnt/projects/hpcuser1_project/ollama/

# Add the ollama binary location to your session PATH
export PATH=$PATH:$OLLAMA_DIR/bin/

# Set a location for model storage
export OLLAMA_MODELS=$OLLAMA_DIR$/models

# Randomize the Ollama port
export OLLAMA_PORT=$(shuf -i 10000-30000 -n 1) # select a random port to run on
export OLLAMA_HOST=0.0.0.0:$OLLAMA_PORT

# Output the connection details to a file
echo "export OLLAMA_HOST=$(hostname):$OLLAMA_PORT" > $OLLAMA_DIR/connection.txt

echo ""
echo "----------------------------------------------------------------------------------------"
echo ""
echo "  Ollama Server Connection Details:"
echo ""
echo "      Server: $(hostname)"
echo "        Port: $OLLAMA_PORT"
echo ""
echo "  This job will end at $(squeue --noheader -j $SLURM_JOBID -o %e)."
echo "  Ollama server will gracefully terminate two minutes before the end of the job."
echo ""
echo "  Connection details have been saved in: $OLLAMA_DIR/connection.txt"
echo "  Source these variables in other Slurm jobs by running:"
echo ""
echo "      export OLLAMA_DIR=/mnt/projects/hpcuser1_project/ollama/"
echo "      source $OLLAMA_DIR/connection.txt"
echo ""
echo "  To connect to this Ollama instance from outside of Clipper using SSH port forwarding:"
echo ""
echo "  ssh -t -t $USER@clipper.gvsu.edu -L 11434:localhost:$OLLAMA_PORT ssh $(hostname) -L $OLLAMA_PORT:localhost:$OLLAMA_PORT"
echo ""
echo "  This command will forward all requests on port 11434 from your local machine"
echo "  to the Ollama server instance running on $(hostname)."
echo "  Please note this command will change for every submission of this job."
echo "  Check this output for the correct ports to forward each time this job is submitted."
echo ""
echo "----------------------------------------------------------------------------------------"
echo ""

# run ollama server until job times out or is cancelled
ollama serve

Submit the job:

sbatch ollama-server.sbatch

Connection details will be located at the top of the ollama-server.out file.

----------------------------------------------------------------------------------------

  Ollama Server Connection Details:

      Server: g002.clipper.gvsu.edu
        Port: 12754

  This job will end at 2025-11-12T07:41:53.
  Ollama server will gracefully terminate two minutes before the end of the job.

  To connect to this instance from outside of Clipper using SSH port forwarding:

  ssh -t -t hpcuser1@clipper.gvsu.edu -L 11434:localhost:12754 ssh g002.clipper.gvsu.edu -L 12754:localhost:12754

  This command will forward all requests on port 11434 from your local machine
  to the Ollama server instance running on g002.clipper.gvsu.edu.
  Please note this command will change for every submission of this job.
  Check this output for the correct ports to forward each time this job is submitted.

----------------------------------------------------------------------------------------

Ollama currently has no mechanisms for authentication. This means another cluster user could potentially connect to your Ollama instance while it is running.

Interacting with Ollama Server with Ollama Python Library

The earlier example assumes you will be interacting with Ollama with a client installed on your local system outside of Clipper (for example, Chatbox).

It is also possible to interact with the Ollama server from another batch job. In this example, the Ollama Python library is used.

Create a Python virtual environment with the Ollama library installed:

python3 -m venv /mnt/projects/hpcuser1_project/ollama/venv
source /mnt/projects/hpcuser1_project/ollama/venv/bin/activate
python3 -m pip install ollama

Create a Python script with an Ollama prompt, for example sky-blue.py:

#!/usr/bin/env python3

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print("")
print(response['message']['content'])
print("")

Then create a job script that will source the Ollama connection details and run the Python script.

#!/bin/bash
#SBATCH --job-name=ollama-client
#SBATCH --partition=cpu # or bigmem, the client does not need a gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4 # number of cpus to use
#SBATCH --mem=16G # amount of ram to request
#SBATCH --time=5-00:00:00 # time to run, max is five days shown here
#SBATCH --output=ollama-client.out # standard output/error location

# Set the base location for your Ollama installation/project
export OLLAMA_DIR=/mnt/projects/hpcuser1_project/ollama/

# This adds the ollama binary location to your session PATH
export PATH=$PATH:$OLLAMA_DIR/bin/

# Source the connection details from the server into this session
source $OLLAMA_DIR/connection.txt

# Echo out the ollama host
echo ""
echo "Using Ollama server located at: $OLLAMA_HOST"
echo ""

# pull a model, in this case gemma3 to match our python script
# list of models is available here: https://ollama.com/library
echo "Pulling gemma3..."
echo ""
ollama pull gemma3
echo ""

# source our python environment with the ollama client installed
source $OLLAMA_DIR/venv/bin/activate

# run our python script
python sky-blue.py

Finally, submit the job:

sbatch ollama-client.sbatch