BenchFlow Qwen3.5-9B Env-0 Qwen397B-Data Custom SFT LoRA Adapter

This repository contains the current BenchFlow env-0 SFT adapter for Qwen/Qwen3.5-9B. It is a PEFT LoRA adapter only; load it with the base Qwen/Qwen3.5-9B checkpoint.

Current Version

Field	Value
Adapter repo	`benchflow/benchflow-qwen35-9b`
Published model PR	HF PR #4
Adapter commit promoted from PR	`92380a83764ec2d8b2103a3895e24e49a508d1d9`
Training run id	`qwen35-397b-data-qwen35-9b-custom-sft-20260630T042600Z`
Base checkpoint	`Qwen/Qwen3.5-9B`
Adapter type	LoRA / PEFT
Trainer	Custom PyTorch + PEFT LoRA trainer, `experiments/env-0-posttrain-mvp/train_lora_sft.py`
Source data	Qwen3.5-397B teacher trajectories collected with BenchFlow, OpenHands, and Daytona
Training rows	`298` all-training-ready rows
Hardware	1x H100 80GB

This run used the historical custom trainer. Prime-RL was not used as the SFT trainer; the source data path includes prime-rl only because the trajectories were also validated and exported in Prime-SFT-compatible format.

Training Recipe

Field	Value
Precision	BF16
Quantization	None
Context length	`8192`
Max trainer steps	`300` micro-batch steps
Micro batch size	`1`
Gradient accumulation	`8`
Approx optimizer updates	`37`
Learning rate	`1e-4`
Scheduler	None
Max grad norm	`1.0`
LoRA rank	`32`
LoRA alpha	`64`
LoRA dropout	`0.05`
LoRA targets	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Checkpoints	`100`, `200`, `300`, `best_adapter`, `final_adapter`
Final eval loss	`0.1476329118013382`

Evaluation Results

Evaluation	Runtime	Strict pass
Mobile300	SGLang	`135 / 300`
Mobile300	Fireworks	`134 / 300`
standard60, 3 trials	Fireworks	`25 / 180`

Artifact Links

Artifact	Link
Source teacher trajectories	HF dataset folder
Training artifacts	HF dataset folder
Fireworks Mobile300 eval	HF dataset folder
Fireworks standard60 eval	HF dataset folder
Reproduction report	GitHub report

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3.5-9B"
adapter_id = "benchflow/benchflow-qwen35-9b"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

Intended Use And Limitations

This adapter is an env-0 research artifact for controlled BenchFlow/OpenHands/Daytona evaluation. It is not a general-purpose safety-tested assistant model and should not be treated as production-ready for autonomous operation.

Downloads last month: 171

Model tree for benchflow/benchflow-qwen35-9b

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(384)

this model

benchflow
/

benchflow-qwen35-9b