WindyWord
/

listen-windy-pro-engine

@@ -6,6 +6,7 @@ tags:
 - windyword
 - english
 - multilingual
 library_name: transformers
 pipeline_tag: automatic-speech-recognition
 language:
@@ -15,13 +16,32 @@ language:
 # WindyWord.ai STT — Windy Pro Engine
-**Multilingual speech-to-text engine. Transcribes audio in 100+ languages, with English as the primary trained domain.**
 ## Profile
 - **Architecture:** 1.55B params · whisper-large-v3
 - **Profile:** premium / max accuracy
 - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 ## Variants in this repo
@@ -43,7 +63,6 @@ model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-
 For CPU inference via CTranslate2:
 ```python
 import ctranslate2
-# After downloading the ct2-int8 subfolder:
 model = ctranslate2.models.Whisper("path/to/ct2-int8/")
 ```

 - windyword
 - english
 - multilingual
+- multilingual-fallback
 library_name: transformers
 pipeline_tag: automatic-speech-recognition
 language:
 # WindyWord.ai STT — Windy Pro Engine
+**The flagship multilingual speech-to-text engine. Transcribes audio in 99+ languages with state-of-the-art quality.**
+## Recommended fallback for low-resource languages
+This is the **multilingual fallback model** for the WindyWord STT fleet. When a language-specific Lingua model is missing or underperforms (we explicitly flag these in the language-specific READMEs), production users should route through this model with the appropriate `language=` hint:
+```python
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+processor = WhisperProcessor.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
+model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
+# ig (Igbo), mn (Mongolian), or any thin-coverage language:
+ids = model.generate(input_features, language="ig", task="transcribe")
+```
+Languages currently flagged for this fallback:
+- **Igbo (ig)** — community ASR thin; only available fine-tune is whisper-tiny which is 39M params.
+- **Mongolian (mn)** — both predecessor and upgrade attempts have audited at ~100% WER on FLEURS.
+- **Hebrew (he)**, **Malayalam (ml)** — current language-specific models are MARGINAL; whisper-large-v3 may give better real-world results.
 ## Profile
 - **Architecture:** 1.55B params · whisper-large-v3
 - **Profile:** premium / max accuracy
 - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
+- **Multilingual:** 99 languages directly supported; auto-detects language by default
 ## Variants in this repo
 For CPU inference via CTranslate2:
 ```python
 import ctranslate2
 model = ctranslate2.models.Whisper("path/to/ct2-int8/")
 ```