Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21 • 4
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21 • 4 • 2
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published Feb 17 • 8
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published Feb 17 • 8 • 2
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10 • 75
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published Dec 6, 2024 • 50
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 6.41M • 634