Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers Paper • 2602.06079 • Published Feb 4 • 18
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published Jan 26 • 29
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 105
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 188
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Paper • 2505.13529 • Published May 18, 2025 • 11
Prompt-Driven LLM Safeguarding via Directed Representation Optimization Paper • 2401.18018 • Published Jan 31, 2024 • 1
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20, 2025 • 109
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21, 2025 • 67