Instructions to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "M-Alkassem/qwen2.5-coder-3b-agent-v1") - Transformers
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="M-Alkassem/qwen2.5-coder-3b-agent-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("M-Alkassem/qwen2.5-coder-3b-agent-v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "M-Alkassem/qwen2.5-coder-3b-agent-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "M-Alkassem/qwen2.5-coder-3b-agent-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-agent-v1
- SGLang
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "M-Alkassem/qwen2.5-coder-3b-agent-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "M-Alkassem/qwen2.5-coder-3b-agent-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "M-Alkassem/qwen2.5-coder-3b-agent-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "M-Alkassem/qwen2.5-coder-3b-agent-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="M-Alkassem/qwen2.5-coder-3b-agent-v1", max_seq_length=2048, ) - Docker Model Runner
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Docker Model Runner:
docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-agent-v1
qwen2.5-coder-3b-agent-v1
This repository contains a LoRA adapter, not a full standalone model.
It is the second-stage adapter in the project and was created by continuing fine-tuning from:
M-Alkassem/qwen2.5-coder-3b-unsloth-lora
The goal of this stage was to make the model more useful in a constrained tool-using workflow, especially for multi-step coding and debugging behavior.
What This Model Is
This adapter is the agent-oriented continued fine-tune in the project.
Training goal:
- improve multi-step software-engineering behavior
- improve inspect → reason → edit → test style behavior
- make the model more useful inside a lightweight coding-agent loop
This adapter should be loaded on top of the Qwen2.5-Coder 3B base model.
Important Context
This adapter was not trained from scratch.
The training path was:
- base model:
unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit - coding-focused adapter:
M-Alkassem/qwen2.5-coder-3b-unsloth-lora - agent-oriented continued fine-tune: this repository
That means this adapter represents the latest learned state after both fine-tuning stages.
Dataset
This adapter was trained on a sampled subset of:
ernie-research/MEnvData-SWE-Trajectory
Project training setup:
- sampled rows:
700 - formatting strategy: tail-capped trajectory formatting to fit the token budget
- max sequence length:
1024 - training steps:
150
Training Summary
This model was trained with supervised fine-tuning (SFT) using LoRA and 4-bit quantization.
Key setup:
- continued from the coding adapter
- batch size per device:
1 - gradient accumulation:
16 - learning rate:
5e-5 - optimizer:
adamw_8bit - hardware: Google Colab
Tesla T4
Observed result:
- final training loss: about
1.2940
Intended Use
Use this adapter when you want:
- a model that is better suited for a constrained coding-agent workflow
- more agent-style behavior in inspect/edit/test tasks
- a reasoning core for a lightweight tool-using coding agent
This adapter is most meaningful when paired with:
- a controller loop
- file tools
- Python execution tools
- iterative feedback from tool outputs
Limitations
This adapter is not a standalone merged model.
It also did not perform best in the plain direct-answer benchmark used in the project. In that evaluation, the original base model remained strongest overall.
So this adapter should not be presented as universally better at plain coding Q&A. Its value is more visible in tool-using and multi-step agent-style workflows.
How To Load
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
BASE_MODEL = "Qwen/Qwen2.5-Coder-3B-Instruct"
ADAPTER_MODEL = "M-Alkassem/qwen2.5-coder-3b-agent-v1"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
quantization_config=bnb_config,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
model.eval()
Example Prompt
prompt = "A stack implementation fails a unit test when pop() is called on an empty stack. Explain how you would debug this step by step and propose a fix."
Project Context This adapter is part of a larger project with:
a coding-focused fine-tune an agent-oriented continued fine-tune a direct-answer benchmark comparing base vs coding adapter vs agent adapter a constrained agent_v2 prototype with file and Python tools In the documented agent_v2 run, the model was able to:
run failing tests detect a bug rewrite code rerun tests stop after success This is the main reason this adapter should be evaluated in both:
direct-answer mode tool-using agent mode References
- Coding adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-unsloth-lora
- Base Qwen2.5-Coder model: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct
- Unsloth quantized base: https://huggingface.co/unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit
- Dataset card: https://huggingface.co/datasets/ernie-research/MEnvData-SWE-Trajectory
Citation
If you use this adapter, please cite the upstream Qwen2.5-Coder work and the dataset used for the agent-oriented continued fine-tune.
@article{hui2024qwen2p5coder,
title={Qwen2.5-Coder Technical Report},
author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jing and Liu, Dayiheng and Zhang, Liqun and Liu, Tianyang and Zhang, Jiawei and Yu, Bo and Lu, Kaican and others},
journal={arXiv preprint arXiv:2409.12186},
year={2024}
}
- Downloads last month
- -