Unsloth Blackwell-Compatible Docker Image

A multi-stage Docker image for Unsloth that runs on every current NVIDIA datacenter and consumer GPU from Turing through Blackwell, on linux/amd64.

Cross-host validation: same image built on a GCP B200 also runs on AWS B200 with bit-for-bit identical LoRA loss progression.

What's in the tarball

Component	Version
Base image	`nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04`
PyTorch	`2.10.0+cu128`
Triton	`3.6.0`
xformers	`0.0.34` (cu128)
bitsandbytes	`0.49.2`
Unsloth	`2026.5.6`
Unsloth Zoo	`2026.5.4`
transformers	`5.5.0`
trl	`0.24.0`
peft	`0.19.1`
accelerate	`1.13.0`
Built-in SASS	`sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120`
Target arch list	`7.5;8.0;8.6;8.9;9.0;10.0;10.3;12.0;12.1+PTX`

Supported GPUs

Compute Cap	GPU family	Examples	Status
sm_75	Turing	T4, RTX 20-series, Quadro RTX	Works (no bf16, falls back to fp16)
sm_80	Ampere DC	A100, A30	Native SASS
sm_86	Ampere	RTX A6000, A40, RTX 30-series	Native SASS
sm_89	Ada Lovelace	L4, L40, L40S, RTX 40-series, RTX 6000 Ada	JIT-PTX from sm_86
sm_90	Hopper	H100, H200, GH200	Native SASS
sm_100	Blackwell DC	B100, B200, GB200	Native SASS
sm_103	Blackwell DC	B300, GB300	JIT-PTX from sm_100
sm_120	Blackwell consumer	RTX 50-series, RTX PRO 6000 Blackwell	Native SASS
sm_121	Blackwell	GB10 (DGX Spark)	JIT-PTX from sm_120

DGX Spark (GB10) is an ARM host; this image is linux/amd64 only. A linux/arm64 variant is on the roadmap.

Quick start

Pull the image (via Hugging Face Hub)

pip install -U huggingface_hub
huggingface-cli login

huggingface-cli download danielhanchen/unsloth-blackwell-docker \
    unsloth-blackwell.tar.gz --local-dir /tmp

gunzip -c /tmp/unsloth-blackwell.tar.gz | docker load
docker images unsloth-blackwell:test

Or use the bundled helper from the PR branch:

git clone -b docker-blackwell-build https://github.com/unslothai/unsloth.git
cd unsloth
bash docker/hf_pull.sh danielhanchen/unsloth-blackwell-docker \
     unsloth-blackwell.tar.gz unsloth-blackwell:test

Run a quick smoke test (5-step LoRA on Llama-3.2-1B-4bit)

docker run --rm --gpus all unsloth-blackwell:test python /workspace/smoke_test.py

Expected output starts with Unsloth container: N GPU(s). Primary: <your GPU> sm_XX bf16=True and ends with === all checks passed === after 5 LoRA steps with decreasing loss.

Run real training

The bundled run.sh wrapper sets the docker flags people most often forget (--gpus all, --ipc=host, --ulimit memlock=-1, plus mounts for HF cache and Triton cache):

bash docker/run.sh                               # interactive python REPL
bash docker/run.sh bash                          # shell in container
bash docker/run.sh python /workspace/host/train.py

Build host requirements

Docker 28+ with buildx plugin (sudo apt-get install -y docker-buildx)
nvidia-container-toolkit (for --gpus all)
Host NVIDIA driver compatible with CUDA 12.8:
- >= 570 for sm_120 / sm_121
- >= 555 for sm_100 / sm_103
- >= 535 for sm_90
- >= 525 for sm_75 / sm_80 / sm_86 / sm_89

No GPU is needed at build time. The cu128 wheels are fat binaries (cross-compiled upstream by PyTorch), and the Dockerfile's build-time verification reads compiled wheel metadata rather than calling into CUDA. The image can be built on ubuntu-latest GitHub Actions runners with no GPU attached.

Source

The image is built from the docker-blackwell-build branch on unslothai/unsloth (PR #5748).

Once that PR lands, the same image will be published to Docker Hub at docker.io/unsloth/unsloth via a GitHub Actions workflow that runs on every push to main, every tag, and weekly via cron. The HF Hub mirror here is a temporary blob for cross-host validation testing.

License

The Docker image follows the Apache 2.0 license of unslothai/unsloth. The bundled PyTorch wheels, NVIDIA CUDA libraries, and other dependencies follow their own respective licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support