VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Paper • 2603.23481 • Published Mar 24 • 7
view article Article 📌 Rethinking Multimodality from an Industry Perspective: Captioning Is Far More Important Than You Think Borise • Nov 29, 2025 • 3
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 28
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 212
Agile Continuous Jumping in Discontinuous Terrains Paper • 2409.10923 • Published Sep 17, 2024 • 12
view article Article Key Insights into the Law of Vision Representations in MLLMs Borise • Sep 2, 2024 • 20
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption Paper • 2310.01779 • Published Oct 3, 2023 • 4