A Subgoal-driven Framework for Improving Long-Horizon LLM Agents Paper • 2603.19685 • Published 3 days ago • 5
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Paper • 2603.18002 • Published 4 days ago • 9
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published 3 days ago • 86
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 4 days ago • 12
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published 5 days ago • 82
MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation Paper • 2603.16861 • Published 5 days ago • 4
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 6 days ago • 175
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Paper • 2603.16871 • Published 5 days ago • 58
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 6 days ago • 143
VQQA: An Agentic Approach for Video Evaluation and Quality Improvement Paper • 2603.12310 • Published 10 days ago • 7
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 10 days ago • 90
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness Paper • 2603.09200 • Published 13 days ago • 5
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports Paper • 2603.09896 • Published 12 days ago • 26
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 13 days ago • 52
Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces Paper • 2603.06713 • Published 17 days ago • 16