Instructions to use happyme531/Qwen3-ASR-1.7B-RKLLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- RKLLM
How to use happyme531/Qwen3-ASR-1.7B-RKLLM with RKLLM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Qwen3-ASR-1.7B → RK3588 模型转换
(English README see below)
把 Qwen/Qwen3-ASR-1.7B 转成 RK3588 上可运行的 RKNN + RKLLM。
- 音频编码器(
thinker.audio_tower)→ ONNX → RKNN - 文本 LLM(
thinker.model + thinker.lm_head)→ 标准Qwen3ForCausalLM→ RKLLM - 全程 FP16,不量化
目录
convert/
├── audio_encoder/
│ ├── common.py 共享工具
│ ├── export_audio_encoder_onnx.py PyTorch → ONNX
│ ├── export_audio_encoder_rknn.py ONNX → RKNN
│ └── onnx_run_audio_encoder.py ONNX 对齐校验(可选)
└── llm/
├── extract_qwen3_text_model.py 抽取标准 Qwen3 文本权重
└── export_rkllm_direct.py HF → RKLLM
准备
把原始模型放到工作目录,并把官方 QwenLM/Qwen3-ASR 仓库克隆到同级目录(audio_encoder/common.py 需要从里面 import modeling_qwen3_asr):
huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir .
git clone https://github.com/QwenLM/Qwen3-ASR.git
主机依赖:torch transformers safetensors numpy scipy soundfile onnx onnxruntime,以及 rknn-toolkit2 和 rkllm-toolkit。
1. LLM → RKLLM
先抽出干净的 Qwen3 文本权重(直接喂原模型会因为残留的 mrope/vision_config 字段被 RKLLM 误判成视觉模型):
python convert/llm/extract_qwen3_text_model.py \
--model-path . \
--output-dir ./qwen3_text_hf
然后转 RKLLM:
python convert/llm/export_rkllm_direct.py \
--model-path ./qwen3_text_hf \
--target-platform rk3588 --num-npu-core 3 \
--dtype float16 --max-context 4096 \
--savepath ./rknn/language_model.rkllm
2. 音频编码器 → RKNN
音频塔被包成「100 mel 帧 / chunk」的静态模型(这是原模型本身的处理粒度),长音频在运行时分块跑再拼回去。
# PyTorch → ONNX
python convert/audio_encoder/export_audio_encoder_onnx.py \
--model-path . --savepath ./onnx/qwen3_asr_audio_chunk100.onnx
# (可选) 对齐校验,正常 max_abs_diff ≈ 1e-7
python convert/audio_encoder/onnx_run_audio_encoder.py \
--model-path . --onnx-path ./onnx/qwen3_asr_audio_chunk100.onnx \
--audio-path asr_example_zh.wav --compare-torch
# ONNX → RKNN
python convert/audio_encoder/export_audio_encoder_rknn.py \
--onnx-path ./onnx/qwen3_asr_audio_chunk100.onnx \
--target-platform rk3588 --savepath ./rknn/audio_encoder.rknn
3. 产物
rknn/
├── audio_encoder.rknn
└── language_model.rkllm
直接对接仓库根目录的 run_qwen3_asr_e2e.py。
Qwen3-ASR-1.7B → RK3588 Model Conversion
Convert Qwen/Qwen3-ASR-1.7B to RKNN + RKLLM for RK3588.
- Audio encoder (
thinker.audio_tower) → ONNX → RKNN - Text LLM (
thinker.model + thinker.lm_head) → standardQwen3ForCausalLM→ RKLLM - FP16 throughout, no quantization
Layout
convert/
├── audio_encoder/
│ ├── common.py shared helpers
│ ├── export_audio_encoder_onnx.py PyTorch → ONNX
│ ├── export_audio_encoder_rknn.py ONNX → RKNN
│ └── onnx_run_audio_encoder.py ONNX parity check (optional)
└── llm/
├── extract_qwen3_text_model.py extract standard Qwen3 text weights
└── export_rkllm_direct.py HF → RKLLM
Setup
Place the original model in your working directory and clone the official QwenLM/Qwen3-ASR repo as a sibling (audio_encoder/common.py imports modeling_qwen3_asr from it):
huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir .
git clone https://github.com/QwenLM/Qwen3-ASR.git
Host dependencies: torch transformers safetensors numpy scipy soundfile onnx onnxruntime, plus rknn-toolkit2 and rkllm-toolkit.
1. LLM → RKLLM
First extract clean Qwen3 text weights (feeding the original model directly trips RKLLM into thinking it's a vision model because of leftover mrope/vision_config fields):
python convert/llm/extract_qwen3_text_model.py \
--model-path . \
--output-dir ./qwen3_text_hf
Then convert to RKLLM:
python convert/llm/export_rkllm_direct.py \
--model-path ./qwen3_text_hf \
--target-platform rk3588 --num-npu-core 3 \
--dtype float16 --max-context 4096 \
--savepath ./rknn/language_model.rkllm
2. Audio encoder → RKNN
The audio tower is wrapped as a static "100 mel frames / chunk" model (this matches the model's native processing granularity); long audio is split, run chunk-by-chunk and concatenated at runtime.
# PyTorch → ONNX
python convert/audio_encoder/export_audio_encoder_onnx.py \
--model-path . --savepath ./onnx/qwen3_asr_audio_chunk100.onnx
# (optional) parity check, expect max_abs_diff ≈ 1e-7
python convert/audio_encoder/onnx_run_audio_encoder.py \
--model-path . --onnx-path ./onnx/qwen3_asr_audio_chunk100.onnx \
--audio-path asr_example_zh.wav --compare-torch
# ONNX → RKNN
python convert/audio_encoder/export_audio_encoder_rknn.py \
--onnx-path ./onnx/qwen3_asr_audio_chunk100.onnx \
--target-platform rk3588 --savepath ./rknn/audio_encoder.rknn
3. Artifacts
rknn/
├── audio_encoder.rknn
└── language_model.rkllm
These plug directly into run_qwen3_asr_e2e.py at the repo root.