The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 5 days ago • 193
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published 5 days ago • 148
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Paper • 2602.22766 • Published 5 days ago • 38
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published 5 days ago • 33
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Paper • 2602.23258 • Published 5 days ago • 27
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published 5 days ago • 20
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games Paper • 2602.17594 • Published 12 days ago • 9
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 22 days ago • 255
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 20 days ago • 216
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 26 days ago • 342
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published about 1 month ago • 307
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 22 days ago • 274
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 21 days ago • 237
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 228