BenchFlow Qwen3.5-9B Env-0 Qwen397B-Data Custom SFT LoRA Adapter

This repository contains the current BenchFlow env-0 SFT adapter for Qwen/Qwen3.5-9B. It is a PEFT LoRA adapter only; load it with the base Qwen/Qwen3.5-9B checkpoint.

Current Version

Field Value
Adapter repo benchflow/benchflow-qwen35-9b
Published model PR HF PR #4
Adapter commit promoted from PR 92380a83764ec2d8b2103a3895e24e49a508d1d9
Training run id qwen35-397b-data-qwen35-9b-custom-sft-20260630T042600Z
Base checkpoint Qwen/Qwen3.5-9B
Adapter type LoRA / PEFT
Trainer Custom PyTorch + PEFT LoRA trainer, experiments/env-0-posttrain-mvp/train_lora_sft.py
Source data Qwen3.5-397B teacher trajectories collected with BenchFlow, OpenHands, and Daytona
Training rows 298 all-training-ready rows
Hardware 1x H100 80GB

This run used the historical custom trainer. Prime-RL was not used as the SFT trainer; the source data path includes prime-rl only because the trajectories were also validated and exported in Prime-SFT-compatible format.

Training Recipe

Field Value
Precision BF16
Quantization None
Context length 8192
Max trainer steps 300 micro-batch steps
Micro batch size 1
Gradient accumulation 8
Approx optimizer updates 37
Learning rate 1e-4
Scheduler None
Max grad norm 1.0
LoRA rank 32
LoRA alpha 64
LoRA dropout 0.05
LoRA targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Checkpoints 100, 200, 300, best_adapter, final_adapter
Final eval loss 0.1476329118013382

Evaluation Results

Evaluation Runtime Strict pass
Mobile300 SGLang 135 / 300
Mobile300 Fireworks 134 / 300
standard60, 3 trials Fireworks 25 / 180

Artifact Links

Artifact Link
Source teacher trajectories HF dataset folder
Training artifacts HF dataset folder
Fireworks Mobile300 eval HF dataset folder
Fireworks standard60 eval HF dataset folder
Reproduction report GitHub report

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3.5-9B"
adapter_id = "benchflow/benchflow-qwen35-9b"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

Intended Use And Limitations

This adapter is an env-0 research artifact for controlled BenchFlow/OpenHands/Daytona evaluation. It is not a general-purpose safety-tested assistant model and should not be treated as production-ready for autonomous operation.

Downloads last month
171
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benchflow/benchflow-qwen35-9b

Finetuned
Qwen/Qwen3.5-9B
Adapter
(384)
this model

Dataset used to train benchflow/benchflow-qwen35-9b