Chain of thought
updated
NousResearch/Hermes-4-70B-FP8
Text Generation
•
71B
•
Updated
•
198
•
25
NousResearch/Hermes-4-405B-FP8
Text Generation
•
406B
•
Updated
•
452
•
20
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
•
2508.21365
•
Published
•
29
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper
•
2509.15566
•
Published
•
14
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper
•
2510.19363
•
Published
•
61
π_RL: Online RL Fine-tuning for Flow-based
Vision-Language-Action Models
Paper
•
2510.25889
•
Published
•
65
moonshotai/Kimi-K2-Thinking
Text Generation
•
Updated
•
290k
•
•
1.6k
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
•
2511.08567
•
Published
•
33
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Paper
•
2511.09515
•
Published
•
18
open-thoughts/OpenThinker-Agent-v1
Text Generation
•
8B
•
Updated
•
1.38k
•
89