Unsloth Blackwell-Compatible Docker Image

A multi-stage Docker image for Unsloth that runs on every current NVIDIA datacenter and consumer GPU from Turing through Blackwell, on linux/amd64.

Cross-host validation: same image built on a GCP B200 also runs on AWS B200 with bit-for-bit identical LoRA loss progression.

What's in the tarball

Component Version
Base image nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
PyTorch 2.10.0+cu128
Triton 3.6.0
xformers 0.0.34 (cu128)
bitsandbytes 0.49.2
Unsloth 2026.5.6
Unsloth Zoo 2026.5.4
transformers 5.5.0
trl 0.24.0
peft 0.19.1
accelerate 1.13.0
Built-in SASS sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120
Target arch list 7.5;8.0;8.6;8.9;9.0;10.0;10.3;12.0;12.1+PTX

Supported GPUs

Compute Cap GPU family Examples Status
sm_75 Turing T4, RTX 20-series, Quadro RTX Works (no bf16, falls back to fp16)
sm_80 Ampere DC A100, A30 Native SASS
sm_86 Ampere RTX A6000, A40, RTX 30-series Native SASS
sm_89 Ada Lovelace L4, L40, L40S, RTX 40-series, RTX 6000 Ada JIT-PTX from sm_86
sm_90 Hopper H100, H200, GH200 Native SASS
sm_100 Blackwell DC B100, B200, GB200 Native SASS
sm_103 Blackwell DC B300, GB300 JIT-PTX from sm_100
sm_120 Blackwell consumer RTX 50-series, RTX PRO 6000 Blackwell Native SASS
sm_121 Blackwell GB10 (DGX Spark) JIT-PTX from sm_120

DGX Spark (GB10) is an ARM host; this image is linux/amd64 only. A linux/arm64 variant is on the roadmap.

Quick start

Pull the image (via Hugging Face Hub)

pip install -U huggingface_hub
huggingface-cli login

huggingface-cli download danielhanchen/unsloth-blackwell-docker \
    unsloth-blackwell.tar.gz --local-dir /tmp

gunzip -c /tmp/unsloth-blackwell.tar.gz | docker load
docker images unsloth-blackwell:test

Or use the bundled helper from the PR branch:

git clone -b docker-blackwell-build https://github.com/unslothai/unsloth.git
cd unsloth
bash docker/hf_pull.sh danielhanchen/unsloth-blackwell-docker \
     unsloth-blackwell.tar.gz unsloth-blackwell:test

Run a quick smoke test (5-step LoRA on Llama-3.2-1B-4bit)

docker run --rm --gpus all unsloth-blackwell:test python /workspace/smoke_test.py

Expected output starts with Unsloth container: N GPU(s). Primary: <your GPU> sm_XX bf16=True and ends with === all checks passed === after 5 LoRA steps with decreasing loss.

Run real training

The bundled run.sh wrapper sets the docker flags people most often forget (--gpus all, --ipc=host, --ulimit memlock=-1, plus mounts for HF cache and Triton cache):

bash docker/run.sh                               # interactive python REPL
bash docker/run.sh bash                          # shell in container
bash docker/run.sh python /workspace/host/train.py

Build host requirements

  • Docker 28+ with buildx plugin (sudo apt-get install -y docker-buildx)
  • nvidia-container-toolkit (for --gpus all)
  • Host NVIDIA driver compatible with CUDA 12.8:
    • >= 570 for sm_120 / sm_121
    • >= 555 for sm_100 / sm_103
    • >= 535 for sm_90
    • >= 525 for sm_75 / sm_80 / sm_86 / sm_89

No GPU is needed at build time. The cu128 wheels are fat binaries (cross-compiled upstream by PyTorch), and the Dockerfile's build-time verification reads compiled wheel metadata rather than calling into CUDA. The image can be built on ubuntu-latest GitHub Actions runners with no GPU attached.

Source

The image is built from the docker-blackwell-build branch on unslothai/unsloth (PR #5748).

Once that PR lands, the same image will be published to Docker Hub at docker.io/unsloth/unsloth via a GitHub Actions workflow that runs on every push to main, every tag, and weekly via cron. The HF Hub mirror here is a temporary blob for cross-host validation testing.

License

The Docker image follows the Apache 2.0 license of unslothai/unsloth. The bundled PyTorch wheels, NVIDIA CUDA libraries, and other dependencies follow their own respective licenses.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support