lmms-lab/RefCOCOg
Viewer
•
Updated
•
12.6k
•
1.45k
•
9
Feeling and building the multimodal intelligence.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling