Instructions to use jedisct1/MiMo-7B-RL-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jedisct1/MiMo-7B-RL-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jedisct1/MiMo-7B-RL-GGUF", filename="MiMo-7B-RL-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use jedisct1/MiMo-7B-RL-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Use Docker
docker model run hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use jedisct1/MiMo-7B-RL-GGUF with Ollama:
ollama run hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
- Unsloth Studio new
How to use jedisct1/MiMo-7B-RL-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jedisct1/MiMo-7B-RL-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jedisct1/MiMo-7B-RL-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jedisct1/MiMo-7B-RL-GGUF to start chatting
- Pi new
How to use jedisct1/MiMo-7B-RL-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jedisct1/MiMo-7B-RL-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jedisct1/MiMo-7B-RL-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use jedisct1/MiMo-7B-RL-GGUF with Docker Model Runner:
docker model run hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
- Lemonade
How to use jedisct1/MiMo-7B-RL-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MiMo-7B-RL-GGUF-Q4_K_M
List all available models
lemonade list
MiMo-7B-RL (GGUF)
This is a GGUF quantized version of XiaomiMiMo/MiMo-7B-RL, optimized for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. The model has been converted from the original SafeTensors format to GGUF.
Model Description
MiMo-7B-RL is a powerful 7B parameter language model developed by Xiaomi, specifically designed for enhanced reasoning capabilities in both mathematics and code. The original model matches the performance of OpenAI's o1-mini in many benchmarks.
Model Details
- Original Model: MiMo-7B-RL by Xiaomi
- Parameters: 7 billion
- Context Length: 32,768 tokens
- Architecture: Modified transformer with 36 layers, 32 attention heads
- Original Format: SafeTensors
- Converted Format: GGUF
- License: MIT
Key features of the original model:
- Trained using a specialized pre-training strategy focused on reasoning tasks
- Fine-tuned with reinforcement learning on 130K mathematics and code problems
- Demonstrates superior performance in both mathematical reasoning and coding tasks
- Matches performance of much larger models in reasoning capabilities
Usage
With Ollama
ollama run mimo-7b-rl-q8
With LM Studio
- Load the model through the LM Studio interface
- Select the GGUF file
- Configure your desired settings
- Start chatting!
With llama.cpp
./main -m mimo-7b-rl-q8.gguf -n 4096
Performance
The original model demonstrates impressive performance across various benchmarks:
| Benchmark | Score |
|---|---|
| MATH-500 (Pass@1) | 95.8% |
| AIME 2024 (Pass@1) | 68.2% |
| AIME 2025 (Pass@1) | 55.4% |
| LiveCodeBench v5 (Pass@1) | 57.8% |
| LiveCodeBench v6 (Pass@1) | 49.3% |
Note: Performance metrics are from the original model. The GGUF conversion may show slightly different results due to quantization.
Limitations and Biases
The model inherits any limitations and biases present in the original MiMo-7B-RL model. Additionally:
- Q8 quantization may result in slightly reduced performance compared to the original model
- The model requires careful prompt engineering for optimal results in reasoning tasks
- Performance may vary depending on the specific GGUF inference implementation used
Training Details
The model was trained by Xiaomi using:
- Pre-training on approximately 25 trillion tokens
- Three-stage data mixture strategy
- Multiple-Token Prediction as an additional training objective
- RL fine-tuning on 130K mathematics and code problems
For detailed training information, please refer to the original model card.
Citation
If you use this model, please cite the original work:
@misc{xiaomi2025mimo,
title={MiMo: Unlocking the Reasoning Potential of Language Model โ From Pretraining to Posttraining},
author={{Xiaomi LLM-Core Team}},
year={2025},
primaryClass={cs.CL},
url={https://github.com/XiaomiMiMo/MiMo},
}
Acknowledgments
Original model development by Xiaomi LLM-Core Team.
- Downloads last month
- 758
Model tree for jedisct1/MiMo-7B-RL-GGUF
Base model
XiaomiMiMo/MiMo-7B-RL