Instructions to use danielhanchen/unsloth-blackwell-docker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use danielhanchen/unsloth-blackwell-docker with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for danielhanchen/unsloth-blackwell-docker to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for danielhanchen/unsloth-blackwell-docker to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for danielhanchen/unsloth-blackwell-docker to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="danielhanchen/unsloth-blackwell-docker", max_seq_length=2048, )
Unsloth Blackwell-Compatible Docker Image
A multi-stage Docker image for Unsloth that runs on every current NVIDIA datacenter and consumer GPU from Turing through Blackwell, on linux/amd64.
Cross-host validation: same image built on a GCP B200 also runs on AWS B200 with bit-for-bit identical LoRA loss progression.
What's in the tarball
| Component | Version |
|---|---|
| Base image | nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04 |
| PyTorch | 2.10.0+cu128 |
| Triton | 3.6.0 |
| xformers | 0.0.34 (cu128) |
| bitsandbytes | 0.49.2 |
| Unsloth | 2026.5.6 |
| Unsloth Zoo | 2026.5.4 |
| transformers | 5.5.0 |
| trl | 0.24.0 |
| peft | 0.19.1 |
| accelerate | 1.13.0 |
| Built-in SASS | sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120 |
| Target arch list | 7.5;8.0;8.6;8.9;9.0;10.0;10.3;12.0;12.1+PTX |
Supported GPUs
| Compute Cap | GPU family | Examples | Status |
|---|---|---|---|
| sm_75 | Turing | T4, RTX 20-series, Quadro RTX | Works (no bf16, falls back to fp16) |
| sm_80 | Ampere DC | A100, A30 | Native SASS |
| sm_86 | Ampere | RTX A6000, A40, RTX 30-series | Native SASS |
| sm_89 | Ada Lovelace | L4, L40, L40S, RTX 40-series, RTX 6000 Ada | JIT-PTX from sm_86 |
| sm_90 | Hopper | H100, H200, GH200 | Native SASS |
| sm_100 | Blackwell DC | B100, B200, GB200 | Native SASS |
| sm_103 | Blackwell DC | B300, GB300 | JIT-PTX from sm_100 |
| sm_120 | Blackwell consumer | RTX 50-series, RTX PRO 6000 Blackwell | Native SASS |
| sm_121 | Blackwell | GB10 (DGX Spark) | JIT-PTX from sm_120 |
DGX Spark (GB10) is an ARM host; this image is linux/amd64 only. A linux/arm64 variant is on the roadmap.
Quick start
Pull the image (via Hugging Face Hub)
pip install -U huggingface_hub
huggingface-cli login
huggingface-cli download danielhanchen/unsloth-blackwell-docker \
unsloth-blackwell.tar.gz --local-dir /tmp
gunzip -c /tmp/unsloth-blackwell.tar.gz | docker load
docker images unsloth-blackwell:test
Or use the bundled helper from the PR branch:
git clone -b docker-blackwell-build https://github.com/unslothai/unsloth.git
cd unsloth
bash docker/hf_pull.sh danielhanchen/unsloth-blackwell-docker \
unsloth-blackwell.tar.gz unsloth-blackwell:test
Run a quick smoke test (5-step LoRA on Llama-3.2-1B-4bit)
docker run --rm --gpus all unsloth-blackwell:test python /workspace/smoke_test.py
Expected output starts with Unsloth container: N GPU(s). Primary: <your GPU> sm_XX bf16=True and ends with === all checks passed === after 5 LoRA steps with decreasing loss.
Run real training
The bundled run.sh wrapper sets the docker flags people most often forget (--gpus all, --ipc=host, --ulimit memlock=-1, plus mounts for HF cache and Triton cache):
bash docker/run.sh # interactive python REPL
bash docker/run.sh bash # shell in container
bash docker/run.sh python /workspace/host/train.py
Build host requirements
- Docker 28+ with
buildxplugin (sudo apt-get install -y docker-buildx) nvidia-container-toolkit(for--gpus all)- Host NVIDIA driver compatible with CUDA 12.8:
>= 570for sm_120 / sm_121>= 555for sm_100 / sm_103>= 535for sm_90>= 525for sm_75 / sm_80 / sm_86 / sm_89
No GPU is needed at build time. The cu128 wheels are fat binaries (cross-compiled upstream by PyTorch), and the Dockerfile's build-time verification reads compiled wheel metadata rather than calling into CUDA. The image can be built on ubuntu-latest GitHub Actions runners with no GPU attached.
Source
The image is built from the docker-blackwell-build branch on unslothai/unsloth (PR #5748).
Once that PR lands, the same image will be published to Docker Hub at docker.io/unsloth/unsloth via a GitHub Actions workflow that runs on every push to main, every tag, and weekly via cron. The HF Hub mirror here is a temporary blob for cross-host validation testing.
License
The Docker image follows the Apache 2.0 license of unslothai/unsloth. The bundled PyTorch wheels, NVIDIA CUDA libraries, and other dependencies follow their own respective licenses.