-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.68k • 1.24k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 354 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2506.22760
-
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 -
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 123
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Paper • 2506.14702 • Published • 3 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273 -
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.68k • 1.24k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 354 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Paper • 2506.14702 • Published • 3 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273 -
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93
-
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 -
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 123
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4