SALMONN Audio Questioning
⚡
83
Deeply interrogate audio file content
All the bricks you might only need to combine for great AV understanding
Deeply interrogate audio file content
Long video understanding with smart attention
Extraction & Reconstruction for Efficient Speech Separation
Detect and split video scenes into separate clips
Gaze Target Estimation
Dense Grounded Understanding of Images and Videos
Answer questions about uploaded audio or YouTube videos
Image and video tasks with moondream3.