V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval Paper • 2602.06034 • Published 9 days ago • 8
HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models Paper • 2601.15968 • Published 23 days ago • 7
VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery Paper • 2509.17191 • Published Sep 21, 2025 • 1
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence Paper • 2601.06496 • Published Jan 10 • 1
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence Paper • 2601.06496 • Published Jan 10 • 1
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17, 2025 • 4
EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 6
EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 6 • 2
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 6
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 6 • 2
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots Paper • 2511.17889 • Published Nov 22, 2025 • 5
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published Nov 25, 2025 • 49