LLM Reasoning
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
• 2203.14465
• Published
• 9
Let's Verify Step by Step
Paper
• 2305.20050
• Published
• 11
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
• 2411.14405
• Published
• 61
Alphazero-like Tree-Search can Guide Large Language Model Decoding and
Training
Paper
• 2309.17179
• Published
• 2
Paper
• 2412.15115
• Published
• 377
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Paper
• 2410.13639
• Published
• 19
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
• 2411.16489
• Published
• 45
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
• 2410.02884
• Published
• 54
Tree of Problems: Improving structured problem solving with
compositionality
Paper
• 2410.06634
• Published
• 8
Are Your LLMs Capable of Stable Reasoning?
Paper
• 2412.13147
• Published
• 93
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
• 2407.21787
• Published
• 13
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
• 2408.03314
• Published
• 63
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published
• 38
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
• 2411.07279
• Published
• 4
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Paper
• 2410.18451
• Published
• 20
Skywork/Skywork-Reward-Gemma-2-27B-v0.2
Text Classification
• 27B • Updated
• 725
• 33
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper
• 2408.15240
• Published
• 13
Understanding Hidden Computations in Chain-of-Thought Reasoning
Paper
• 2412.04537
• Published
Paper
• 2410.12832
• Published
• 7
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published
• 47
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
Learning
Paper
• 2410.02089
• Published
• 13
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper
• 2402.06457
• Published
• 9
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
• 2412.12881
• Published
• 2
Reinforcement Learning Enhanced LLMs: A Survey
Paper
• 2412.10400
• Published
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
Reinforcement Learning Perspective
Paper
• 2412.14135
• Published
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
• 2412.11605
• Published
• 18
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published
• 33
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Paper
• 2411.11694
• Published
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
Paper
• 2408.16737
• Published
• 1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published
• 104
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
Foundations of Large Language Models
Paper
• 2501.09223
• Published
• 13
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Text Generation
• 33B • Updated
• 825k
• • 1.52k
Reasoning Language Models: A Blueprint
Paper
• 2501.11223
• Published
• 33
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published
• 72
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
• 2501.11110
• Published
• 4
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
s1: Simple test-time scaling
Paper
• 2501.19393
• Published
• 124
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published
• 62
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
• 2502.01718
• Published
• 28
RL + Transformer = A General-Purpose Problem Solver
Paper
• 2501.14176
• Published
• 28
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
• 2502.02339
• Published
• 23
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published
• 62
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published
• 51
Viewer
• Updated
• 1k • 839
• 236
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
• 2502.04404
• Published
• 25
Agency Is Frame-Dependent
Paper
• 2502.04403
• Published
• 23
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Paper
• 2502.04350
• Published
• 11
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
• 2502.02584
• Published
• 16
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
Competitive Programming with Large Reasoning Models
Paper
• 2502.06807
• Published
• 69
On the Emergence of Thinking in LLMs I: Searching for the Right
Intuition
Paper
• 2502.06773
• Published
• 1
LLM Pretraining with Continuous Concepts
Paper
• 2502.08524
• Published
• 30