WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 9 days ago • 45
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis Paper • 2605.14392 • Published 6 days ago • 7
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published 9 days ago • 74
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning Paper • 2605.10923 • Published 9 days ago • 13
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 16 days ago • 119
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 628
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data Paper • 2604.19859 • Published 29 days ago • 51
SWE-chat: Coding Agent Interactions From Real Users in the Wild Paper • 2604.20779 • Published 28 days ago • 15
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 28 days ago • 240
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 30 days ago • 84
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published 30 days ago • 22
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Paper • 2604.18543 • Published 30 days ago • 30
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 103
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 290
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published Apr 8 • 38