DAMO-YOLO : A Report on Real-Time Object Detection Design Paper • 2211.15444 • Published Nov 23, 2022
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Paper • 2410.04417 • Published Oct 6, 2024 • 1
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Paper • 2501.01709 • Published Jan 3
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper • 2502.21257 • Published Feb 28 • 2
Unveiling the Tapestry of Consistency in Large Vision-Language Models Paper • 2405.14156 • Published May 23, 2024
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding Paper • 2504.01407 • Published Apr 2 • 1
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning Paper • 2511.05489 • Published Nov 7 • 2
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Paper • 2410.04417 • Published Oct 6, 2024 • 1