LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper β’ 2511.03001 β’ Published Nov 4, 2025 β’ 47
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces Paper β’ 2510.06953 β’ Published Oct 8, 2025 β’ 9
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL Paper β’ 2506.02338 β’ Published Jun 3, 2025 β’ 5
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper β’ 2505.19640 β’ Published May 26, 2025 β’ 15
Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance Paper β’ 2505.16348 β’ Published May 22, 2025 β’ 52
RLVR-World: Training World Models with Reinforcement Learning Paper β’ 2505.13934 β’ Published May 20, 2025 β’ 16
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper β’ 2505.15277 β’ Published May 21, 2025 β’ 104
Evaluating Language Models as Synthetic Data Generators Paper β’ 2412.03679 β’ Published Dec 4, 2024 β’ 47
Large Language Model-Brained GUI Agents: A Survey Paper β’ 2411.18279 β’ Published Nov 27, 2024 β’ 30
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper β’ 2410.13232 β’ Published Oct 17, 2024 β’ 44
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Paper β’ 2409.19715 β’ Published Sep 29, 2024 β’ 10
VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models Paper β’ 2402.18374 β’ Published Feb 28, 2024 β’ 2
Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback Paper β’ 2311.07215 β’ Published Nov 13, 2023 β’ 3
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents Paper β’ 2310.09343 β’ Published Oct 13, 2023 β’ 2