Qwen3-0.6B-lk-alpha-40k-MNN

Experimental Qwen3-0.6B draft bundle for TokForge + MNN, trained with LK Alpha on a larger 40K draft-training set.

Why this repo exists

This repo is for people exploring whether larger draft datasets materially improve mobile speculative decoding:

Qwen3-0.6B student
Qwen3-8B teacher lane
40K training set
LK Alpha objective
exported as a mobile-ready MNN bundle

Training snapshot

Final logged training acceptance (alpha):

0.7314

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is currently best treated as a research / comparison artifact:

useful if you want to compare 20K vs 40K
not yet the clearest device-side winner over the simpler 20K drafts

Limitations and Intended Use

This is a research comparison artifact first.
We do not currently have stronger preserved device evidence for this variant than for the simpler 20K drafts.
Mobile performance still depends more on target pairing and backend routing than on a small training-objective delta alone.