-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 56 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 62 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 73
Collections
Discover the best community collections!
Collections including paper arxiv:2512.24880
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 14 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 355 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 228 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 221 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Your Group-Relative Advantage Is Biased
Paper • 2601.08521 • Published • 158 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 141 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
BitNet Distillation
Paper • 2510.13998 • Published • 61
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 202 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 150 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 94 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 16 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 56 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 62 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 73
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 14 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 202 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 150 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 355 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 228 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 94 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 221 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Your Group-Relative Advantage Is Biased
Paper • 2601.08521 • Published • 158 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 141 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
BitNet Distillation
Paper • 2510.13998 • Published • 61
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 16 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8