Software Last Updated: 2026-03-18
llamafile v0.10.0
This repository contains a few llamafiles built from our v0.10.0 release. These llamafiles are pre-packaged with support for CPU, Metal GPUs on Mac and CUDA on linux (support for other GPU types is in our roadmap).
For more information about our project, check out our github repo. To learn how to use llamafiles, check our documentation!
| Model | Size | License | llamafile |
|---|---|---|---|
| Qwen3.5 0.8B Q8_0 | 1.6 GB | Apache 2.0 | Qwen3.5-0.8B-Q8_0.llamafile |
| Qwen3.5 2B Q8_0 | 3.2 GB | Apache 2.0 | Qwen3.5-2B-Q8_0.llamafile |
| Ministral 3 3B Instruct 2512 Q4_K_M | 3.4 GB | Apache 2.0 | Ministral-3-3B-Instruct-2512-Q4_K_M.llamafile |
| Qwen3.5 4B Q5_K_S | 4.1 GB | Apache 2.0 | Qwen3.5-4B-Q5_K_S.llamafile |
| llava v1.6 mistral 7b Q4_K_M | 5.3 GB | Apache 2.0 | llava-v1.6-mistral-7b-Q4_K_M.llamafile |
| Apertus 8B Instruct 2509 | 5.9 GB | Apache 2.0 | Apertus-8B-Instruct-2509.llamafile |
| Qwen3.5 9B Q5_K_S | 7.4 GB | Apache 2.0 | Qwen3.5-9B-Q5_K_S.llamafile |
| Ministral 3 3B Instruct 2512 BF16 | 7.8 GB | Apache 2.0 | Ministral-3-3B-Instruct-2512-BF16.llamafile |
| llava v1.6 mistral 7b Q8_0 | 8.4 GB | Apache 2.0 | llava-v1.6-mistral-7b-Q8_0.llamafile |
| gpt-oss 20b mxfp4 | 12 GB | Apache 2.0 | gpt-oss-20b-mxfp4.llamafile |
| gpt-oss 20b Q5_K_S | 12 GB | Apache 2.0 | gpt-oss-20b-Q5_K_S.llamafile |
| LFM2 24B A2B Q5_K_M | 16 GB | lfm1.0 | LFM2-24B-A2B-Q5_K_M.llamafile |
| Qwen3.5 27B Q5_K_S | 19 GB | Apache 2.0 | Qwen3.5-27B-Q5_K_S.llamafile |
NOTE: While the llamafile project is Apache 2.0-licensed, the licenses of models we bundle with it might differ. Use the table above for reference.
- Downloads last month
- 895
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support