sneakyfree commited on
Commit
2ca9457
·
verified ·
1 Parent(s): 633b9ed

Add multilingual-fallback callout for thin-coverage languages

Browse files
Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - windyword
7
  - english
8
  - multilingual
 
9
  library_name: transformers
10
  pipeline_tag: automatic-speech-recognition
11
  language:
@@ -15,13 +16,32 @@ language:
15
 
16
  # WindyWord.ai STT — Windy Pro Engine
17
 
18
- **Multilingual speech-to-text engine. Transcribes audio in 100+ languages, with English as the primary trained domain.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Profile
21
 
22
  - **Architecture:** 1.55B params · whisper-large-v3
23
  - **Profile:** premium / max accuracy
24
  - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 
25
 
26
  ## Variants in this repo
27
 
@@ -43,7 +63,6 @@ model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-
43
  For CPU inference via CTranslate2:
44
  ```python
45
  import ctranslate2
46
- # After downloading the ct2-int8 subfolder:
47
  model = ctranslate2.models.Whisper("path/to/ct2-int8/")
48
  ```
49
 
 
6
  - windyword
7
  - english
8
  - multilingual
9
+ - multilingual-fallback
10
  library_name: transformers
11
  pipeline_tag: automatic-speech-recognition
12
  language:
 
16
 
17
  # WindyWord.ai STT — Windy Pro Engine
18
 
19
+ **The flagship multilingual speech-to-text engine. Transcribes audio in 99+ languages with state-of-the-art quality.**
20
+
21
+ ## Recommended fallback for low-resource languages
22
+
23
+ This is the **multilingual fallback model** for the WindyWord STT fleet. When a language-specific Lingua model is missing or underperforms (we explicitly flag these in the language-specific READMEs), production users should route through this model with the appropriate `language=` hint:
24
+
25
+ ```python
26
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
27
+ processor = WhisperProcessor.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
28
+ model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
29
+
30
+ # ig (Igbo), mn (Mongolian), or any thin-coverage language:
31
+ ids = model.generate(input_features, language="ig", task="transcribe")
32
+ ```
33
+
34
+ Languages currently flagged for this fallback:
35
+ - **Igbo (ig)** — community ASR thin; only available fine-tune is whisper-tiny which is 39M params.
36
+ - **Mongolian (mn)** — both predecessor and upgrade attempts have audited at ~100% WER on FLEURS.
37
+ - **Hebrew (he)**, **Malayalam (ml)** — current language-specific models are MARGINAL; whisper-large-v3 may give better real-world results.
38
 
39
  ## Profile
40
 
41
  - **Architecture:** 1.55B params · whisper-large-v3
42
  - **Profile:** premium / max accuracy
43
  - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
44
+ - **Multilingual:** 99 languages directly supported; auto-detects language by default
45
 
46
  ## Variants in this repo
47
 
 
63
  For CPU inference via CTranslate2:
64
  ```python
65
  import ctranslate2
 
66
  model = ctranslate2.models.Whisper("path/to/ct2-int8/")
67
  ```
68