On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 38
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 101
From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR Paper • 2508.07534 • Published Aug 11, 2025 • 1