Instructions to use thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('image-text-to-text', 'thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16'); - Local Apps
- Unsloth Studio
How to use thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16", max_seq_length=2048, )
Lisper Gemma 4 E2B Audio ONNX q4f16
This is the browser/WebGPU q4f16 package for the Lisper Gemma 4 E2B audio fine-tune.
Component Layout
onnx/embed_tokens_q4f16.onnx: trained Lisper embedding component.onnx/decoder_model_merged_q4f16.onnx: trained Lisper decoder component.onnx/audio_encoder_q4f16.onnx: Gemma 4 E2B q4f16 audio encoder component.onnx/vision_encoder_q4f16.onnx: Gemma 4 E2B q4f16 vision encoder component.
The LoRA training targeted language/text modules; audio modules were not targeted, and vision fine-tuning was disabled. Reusing the official audio/vision components keeps the browser package compatible with the public Gemma 4 E2B ONNX runtime contract while using the trained Lisper text stack.
Evaluation
The release-quality evaluation result belongs to the full v18 hybrid acoustic+Gemma pipeline, not to a separate browser-only q4f16 eval:
- Held-out rows:
2,000 - Hard errors:
0 - Verdict:
pass - Class match:
0.976 - Clear/non-clear match:
0.989 - Exact four-line format:
1.0
This q4f16 package is the browser demo artifact for consumer-device testing.
App Use
Configure the app with:
VITE_LISPER_BROWSER_MODEL_ID=thomasjvu/lisper-gemma4-e2b-audio-onnx-q4f16
VITE_LISPER_BROWSER_DTYPE=q4f16
Expected required browser payload is about 3.15 GB. Keep q4f16 as the primary browser dtype for the hackathon package.
Companion Artifacts
- LoRA adapter:
thomasjvu/lisper-gemma4-e2b-audio-lora - Merged full checkpoint:
thomasjvu/lisper-gemma4-e2b-audio-full - Server-side ZeroGPU fallback:
thomasjvu/lisper-zerogpu
- Downloads last month
- 44