Software Last Updated: 2026-03-18

llamafile v0.10.0

This repository contains a few llamafiles built from our v0.10.0 release. These llamafiles are pre-packaged with support for CPU, Metal GPUs on Mac and CUDA on linux (support for other GPU types is in our roadmap).

For more information about our project, check out our github repo. To learn how to use llamafiles, check our documentation!

Model	Size	License	llamafile
Qwen3.5 0.8B Q8_0	1.6 GB	Apache 2.0	Qwen3.5-0.8B-Q8_0.llamafile
Qwen3.5 2B Q8_0	3.2 GB	Apache 2.0	Qwen3.5-2B-Q8_0.llamafile
Ministral 3 3B Instruct 2512 Q4_K_M	3.4 GB	Apache 2.0	Ministral-3-3B-Instruct-2512-Q4_K_M.llamafile
Qwen3.5 4B Q5_K_S	4.1 GB	Apache 2.0	Qwen3.5-4B-Q5_K_S.llamafile
llava v1.6 mistral 7b Q4_K_M	5.3 GB	Apache 2.0	llava-v1.6-mistral-7b-Q4_K_M.llamafile
Apertus 8B Instruct 2509	5.9 GB	Apache 2.0	Apertus-8B-Instruct-2509.llamafile
Qwen3.5 9B Q5_K_S	7.4 GB	Apache 2.0	Qwen3.5-9B-Q5_K_S.llamafile
Ministral 3 3B Instruct 2512 BF16	7.8 GB	Apache 2.0	Ministral-3-3B-Instruct-2512-BF16.llamafile
llava v1.6 mistral 7b Q8_0	8.4 GB	Apache 2.0	llava-v1.6-mistral-7b-Q8_0.llamafile
gpt-oss 20b mxfp4	12 GB	Apache 2.0	gpt-oss-20b-mxfp4.llamafile
gpt-oss 20b Q5_K_S	12 GB	Apache 2.0	gpt-oss-20b-Q5_K_S.llamafile
LFM2 24B A2B Q5_K_M	16 GB	lfm1.0	LFM2-24B-A2B-Q5_K_M.llamafile
Qwen3.5 27B Q5_K_S	19 GB	Apache 2.0	Qwen3.5-27B-Q5_K_S.llamafile

NOTE: While the llamafile project is Apache 2.0-licensed, the licenses of models we bundle with it might differ. Use the table above for reference.

Downloads last month: 895

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mozilla-ai/llamafile_0.10.0

Finetunes

1 model