About

Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher. More details in the ICASSP 2024 paper: arxiv.org/abs/2311.01070

Inference

Code for training and inference at: https://github.com/naver/multilingual-distilwhisper

Citation

@inproceedings{ferraz2024distilwhisper,
  title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
  author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2024},
  organization={IEEE}
}

Downloads last month: 13

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train naver/multilingual-distilwhisper-28k

Collection including naver/multilingual-distilwhisper-28k

Multilingual DistilWhisper

Collection

Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. • 3 items • Updated Mar 18, 2024 • 6

Paper for naver/multilingual-distilwhisper-28k

DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

Paper • 2311.01070 • Published Nov 2, 2023 • 3