WindyWord.ai STT โ€” Pashto Lingua (GPU (safetensors))

Transcribes Pashto speech (Indo-European > Indo-Iranian > Iranian).

Note: EXCELLENT tier when used correctly. Derived from ihanif/whisper-medium-pashto. Verified at WER 5.3% / CER 3.2% / script-match 99.2% on 50-sample FLEURS ps_af when inference uses forced_decoder_ids (passed explicitly to model.generate() via processor.get_decoder_prompt_ids(language='pashto', task='transcribe')). With the convenience language= kwarg the model can silently drop the Pashto token and hallucinate English script on ~30% of samples (53.7% WER artifact). Always force the decoder prompt for Pashto inference.

Quality

  • WER: unverified by WindyWord harness yet. Imported from upstream community fine-tune.

About this variant

This is the safetensors deployment format of our Pashto Lingua STT model. Load it via the safetensors/ subfolder.

Part of the WindyWord.ai STT fleet โ€” covering 35+ languages that commercial speech-to-text APIs underserve, with proper dialect / script disclosures where they matter.

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
processor = WhisperProcessor.from_pretrained("WindyWord/listen-windy-lingua-ps", subfolder="safetensors")
model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-lingua-ps", subfolder="safetensors")

Commercial Use

Visit windyword.ai for apps and API access.


Provenance & License

Weights derived from upstream community Whisper fine-tunes (see individual model card for exact lineage). Redistributed under Apache-2.0 (inherited).

Certified by Opus 4.6 Opus-Claw (Dr. C) on Veron-1 (RTX 5090, Mt Pleasant SC).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support