benchflow/env0-experiment-trajectories
Updated • 6.67k
How to use benchflow/benchflow-qwen35-9b with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "benchflow/benchflow-qwen35-9b")This repository contains the current BenchFlow env-0 SFT adapter for Qwen/Qwen3.5-9B. It is a PEFT LoRA adapter only; load it with the base Qwen/Qwen3.5-9B checkpoint.
| Field | Value |
|---|---|
| Adapter repo | benchflow/benchflow-qwen35-9b |
| Published model PR | HF PR #4 |
| Adapter commit promoted from PR | 92380a83764ec2d8b2103a3895e24e49a508d1d9 |
| Training run id | qwen35-397b-data-qwen35-9b-custom-sft-20260630T042600Z |
| Base checkpoint | Qwen/Qwen3.5-9B |
| Adapter type | LoRA / PEFT |
| Trainer | Custom PyTorch + PEFT LoRA trainer, experiments/env-0-posttrain-mvp/train_lora_sft.py |
| Source data | Qwen3.5-397B teacher trajectories collected with BenchFlow, OpenHands, and Daytona |
| Training rows | 298 all-training-ready rows |
| Hardware | 1x H100 80GB |
This run used the historical custom trainer. Prime-RL was not used as the SFT trainer; the source data path includes prime-rl only because the trajectories were also validated and exported in Prime-SFT-compatible format.
| Field | Value |
|---|---|
| Precision | BF16 |
| Quantization | None |
| Context length | 8192 |
| Max trainer steps | 300 micro-batch steps |
| Micro batch size | 1 |
| Gradient accumulation | 8 |
| Approx optimizer updates | 37 |
| Learning rate | 1e-4 |
| Scheduler | None |
| Max grad norm | 1.0 |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0.05 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Checkpoints | 100, 200, 300, best_adapter, final_adapter |
| Final eval loss | 0.1476329118013382 |
| Evaluation | Runtime | Strict pass |
|---|---|---|
| Mobile300 | SGLang | 135 / 300 |
| Mobile300 | Fireworks | 134 / 300 |
| standard60, 3 trials | Fireworks | 25 / 180 |
| Artifact | Link |
|---|---|
| Source teacher trajectories | HF dataset folder |
| Training artifacts | HF dataset folder |
| Fireworks Mobile300 eval | HF dataset folder |
| Fireworks standard60 eval | HF dataset folder |
| Reproduction report | GitHub report |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_id = "Qwen/Qwen3.5-9B"
adapter_id = "benchflow/benchflow-qwen35-9b"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
This adapter is an env-0 research artifact for controlled BenchFlow/OpenHands/Daytona evaluation. It is not a general-purpose safety-tested assistant model and should not be treated as production-ready for autonomous operation.