LLM Reasoning - a Giuliano Collection

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 54

Tree of Problems: Improving structured problem solving with compositionality

Paper • 2410.06634 • Published Oct 9, 2024 • 8

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 93

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63

QwQ-32B-Preview

🔍

923

QwQ-32B-Preview

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Paper • 2411.07279 • Published Nov 11, 2024 • 4

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Paper • 2410.18451 • Published Oct 24, 2024 • 20

Skywork/Skywork-Reward-Gemma-2-27B-v0.2

Text Classification • 27B • Updated Oct 25, 2024 • 725 • 33

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Understanding Hidden Computations in Chain-of-Thought Reasoning

Paper • 2412.04537 • Published Dec 5, 2024

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 7

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2, 2024 • 13

V-STaR: Training Verifiers for Self-Taught Reasoners

Paper • 2402.06457 • Published Feb 9, 2024 • 9

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Paper • 2412.12881 • Published Dec 17, 2024 • 2

Reinforcement Learning Enhanced LLMs: A Survey

Paper • 2412.10400 • Published Dec 5, 2024

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Paper • 2412.14135 • Published Dec 18, 2024

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published Jan 3, 2025 • 33

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

Paper • 2411.11694 • Published Nov 18, 2024

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Paper • 2408.16737 • Published Aug 29, 2024 • 1

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 288

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8, 2025 • 99

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9, 2025 • 102

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16, 2025 • 41

Foundations of Large Language Models

Paper • 2501.09223 • Published Jan 16, 2025 • 13

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13, 2025 • 100

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Text Generation • 33B • Updated Feb 24, 2025 • 825k • • 1.52k

Reasoning Language Models: A Blueprint

Paper • 2501.11223 • Published Jan 20, 2025 • 33

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 440

Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26, 2025 • 72

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Paper • 2501.11110 • Published Jan 19, 2025 • 4

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 124

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3, 2025 • 62

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3, 2025 • 28

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 62

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 255

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2, 2025 • 51

simplescaling/s1K

Viewer • Updated Feb 11, 2025 • 1k • 839 • 236

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6, 2025 • 25

Agency Is Frame-Dependent

Paper • 2502.04403 • Published Feb 6, 2025 • 23

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Paper • 2502.04350 • Published Feb 4, 2025 • 11

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Paper • 2502.02584 • Published Feb 4, 2025 • 16

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 152

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 152

Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3, 2025 • 69

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Paper • 2502.06773 • Published Feb 10, 2025 • 1

LLM Pretraining with Continuous Concepts

Paper • 2502.08524 • Published Feb 12, 2025 • 30