mozilla-foundation/common_voice_13_0
Updated • 2.3k • 3
How to use naver/multilingual-distilwhisper-28k with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="naver/multilingual-distilwhisper-28k") # Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("naver/multilingual-distilwhisper-28k", dtype="auto")Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher. More details in the ICASSP 2024 paper: arxiv.org/abs/2311.01070
Code for training and inference at: https://github.com/naver/multilingual-distilwhisper
@inproceedings{ferraz2024distilwhisper,
title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024},
organization={IEEE}
}