sunmaxim (sun)

upvoted a collection 9 months ago

InternVL3

Collection

34 items • Updated Sep 28, 2025 • 83

upvoted 2 articles 9 months ago

Article

Reasoning Datasets Competition

Apr 9, 2025

•

38

Article

Welcome Llama 4 Maverick & Scout on Hugging Face

+5

Apr 5, 2025

•

146

upvoted a paper 9 months ago

Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 24

upvoted an article 9 months ago

Article

Training Large Language Models with Interpreter Feedback using WebAssembly

Apr 3, 2025

•

14

upvoted a paper 10 months ago

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published Mar 13, 2025 • 29

upvoted a collection 10 months ago

OLMo 2

Collection

Artifacts for the OLMo 2 release. • 35 items • Updated 9 days ago • 151

upvoted 3 articles 10 months ago

Article

Open R1: Update #3

Mar 11, 2025

•

296

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

+4

Aug 21, 2024

•

41

Article

双流并行(DualPipe) 没有双流会更好

Feb 28, 2025

•

7

upvoted a paper 10 months ago

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20, 2025 • 47

upvoted 3 papers 11 months ago

upvoted an article 11 months ago

Article

Open-source DeepResearch – Freeing our search agents

+3

Feb 4, 2025

•

1.31k

upvoted a collection 11 months ago

high-quality Chinese training datasets

Collection

a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated May 22, 2025 • 23

upvoted an article 11 months ago

Article

What is test-time compute and how to scale it?

Feb 6, 2025

•

110

upvoted 3 papers about 1 year ago

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45

sun

AI & ML interests

Organizations

InternVL3

Reasoning Datasets Competition

Welcome Llama 4 Maverick & Scout on Hugging Face

Scalable-Softmax Is Superior for Attention

Training Large Language Models with Interpreter Feedback using WebAssembly

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

OLMo 2

Open R1: Update #3

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

双流并行(DualPipe) 没有双流会更好

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

The Curse of Depth in Large Language Models

Cautious Optimizers: Improving Training with One Line of Code

On Teacher Hacking in Language Model Distillation

Open-source DeepResearch – Freeing our search agents

high-quality Chinese training datasets

What is test-time compute and how to scale it?

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Qwen2.5 Technical Report

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

sun

AI & ML interests

Organizations

sunmaxim's activity

Reasoning Datasets Competition

Welcome Llama 4 Maverick & Scout on Hugging Face

Training Large Language Models with Interpreter Feedback using WebAssembly

Open R1: Update #3

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

双流并行(DualPipe) 没有双流会更好

Open-source DeepResearch – Freeing our search agents

What is test-time compute and how to scale it?