Instructions to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "M-Alkassem/qwen2.5-coder-3b-agent-v1")

Transformers

How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="M-Alkassem/qwen2.5-coder-3b-agent-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("M-Alkassem/qwen2.5-coder-3b-agent-v1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "M-Alkassem/qwen2.5-coder-3b-agent-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-agent-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-agent-v1

SGLang

How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "M-Alkassem/qwen2.5-coder-3b-agent-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-agent-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "M-Alkassem/qwen2.5-coder-3b-agent-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-agent-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for M-Alkassem/qwen2.5-coder-3b-agent-v1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="M-Alkassem/qwen2.5-coder-3b-agent-v1",
    max_seq_length=2048,
)

Docker Model Runner
How to use M-Alkassem/qwen2.5-coder-3b-agent-v1 with Docker Model Runner:
```
docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-agent-v1
```

qwen2.5-coder-3b-agent-v1

This repository contains a LoRA adapter, not a full standalone model.

It is the second-stage adapter in the project and was created by continuing fine-tuning from:

M-Alkassem/qwen2.5-coder-3b-unsloth-lora

The goal of this stage was to make the model more useful in a constrained tool-using workflow, especially for multi-step coding and debugging behavior.

What This Model Is

This adapter is the agent-oriented continued fine-tune in the project.

Training goal:

improve multi-step software-engineering behavior
improve inspect → reason → edit → test style behavior
make the model more useful inside a lightweight coding-agent loop

This adapter should be loaded on top of the Qwen2.5-Coder 3B base model.

Important Context

This adapter was not trained from scratch.

The training path was:

base model: unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit
coding-focused adapter: M-Alkassem/qwen2.5-coder-3b-unsloth-lora
agent-oriented continued fine-tune: this repository

That means this adapter represents the latest learned state after both fine-tuning stages.

Dataset

This adapter was trained on a sampled subset of:

ernie-research/MEnvData-SWE-Trajectory

Project training setup:

sampled rows: 700
formatting strategy: tail-capped trajectory formatting to fit the token budget
max sequence length: 1024
training steps: 150

Training Summary

This model was trained with supervised fine-tuning (SFT) using LoRA and 4-bit quantization.

Key setup:

continued from the coding adapter
batch size per device: 1
gradient accumulation: 16
learning rate: 5e-5
optimizer: adamw_8bit
hardware: Google Colab Tesla T4

Observed result:

final training loss: about 1.2940

Intended Use

Use this adapter when you want:

a model that is better suited for a constrained coding-agent workflow
more agent-style behavior in inspect/edit/test tasks
a reasoning core for a lightweight tool-using coding agent

This adapter is most meaningful when paired with:

a controller loop
file tools
Python execution tools
iterative feedback from tool outputs

Limitations

This adapter is not a standalone merged model.

It also did not perform best in the plain direct-answer benchmark used in the project. In that evaluation, the original base model remained strongest overall.

So this adapter should not be presented as universally better at plain coding Q&A. Its value is more visible in tool-using and multi-step agent-style workflows.

How To Load

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

BASE_MODEL = "Qwen/Qwen2.5-Coder-3B-Instruct"
ADAPTER_MODEL = "M-Alkassem/qwen2.5-coder-3b-agent-v1"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
model.eval()

Example Prompt prompt = "A stack implementation fails a unit test when pop() is called on an empty stack. Explain how you would debug this step by step and propose a fix."

Project Context This adapter is part of a larger project with:

a coding-focused fine-tune an agent-oriented continued fine-tune a direct-answer benchmark comparing base vs coding adapter vs agent adapter a constrained agent_v2 prototype with file and Python tools In the documented agent_v2 run, the model was able to:

run failing tests detect a bug rewrite code rerun tests stop after success This is the main reason this adapter should be evaluated in both:

direct-answer mode tool-using agent mode References

Coding adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-unsloth-lora
Base Qwen2.5-Coder model: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct
Unsloth quantized base: https://huggingface.co/unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit
Dataset card: https://huggingface.co/datasets/ernie-research/MEnvData-SWE-Trajectory

Citation

If you use this adapter, please cite the upstream Qwen2.5-Coder work and the dataset used for the agent-oriented continued fine-tune.

@article{hui2024qwen2p5coder,
  title={Qwen2.5-Coder Technical Report},
  author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jing and Liu, Dayiheng and Zhang, Liqun and Liu, Tianyang and Zhang, Jiawei and Yu, Bo and Lu, Kaican and others},
  journal={arXiv preprint arXiv:2409.12186},
  year={2024}
}