Qwen3-0.6B-lk-alpha-40k-MNN

Experimental Qwen3-0.6B draft bundle for TokForge + MNN, trained with LK Alpha on a larger 40K draft-training set.

Why this repo exists

This repo is for people exploring whether larger draft datasets materially improve mobile speculative decoding:

  • Qwen3-0.6B student
  • Qwen3-8B teacher lane
  • 40K training set
  • LK Alpha objective
  • exported as a mobile-ready MNN bundle

Training snapshot

Final logged training acceptance (alpha):

  • 0.7314

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is currently best treated as a research / comparison artifact:

  • useful if you want to compare 20K vs 40K
  • not yet the clearest device-side winner over the simpler 20K drafts

Limitations and Intended Use

  • This is a research comparison artifact first.
  • We do not currently have stronger preserved device evidence for this variant than for the simpler 20K drafts.
  • Mobile performance still depends more on target pairing and backend routing than on a small training-objective delta alone.

Collection

Best-known use

  • Draft model backend: CPU
  • Draft threads: 2
  • Draft predict length: d=3
  • Typical target pairing: Qwen3-8B

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-40k-MNN

Finetuned
Qwen/Qwen3-0.6B
Quantized
(309)
this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-40k-MNN