How can we get the position of text in the generated audio?
#12
by
maifeeulasad
- opened
It's really cool that we can now generate audio in realtime with microsoft/VibeVoice-Realtime-0.5B. I was thinking about integrating it to my application. And then I found a critical UX requirement, if we could highlight the text with the current audio that would be great.
Does vibe voice support this?
Opened an issue: https://github.com/microsoft/VibeVoice/issues/144
Thank you for your interest. Currently, the model cannot provide alignment information between generated speech and text.