Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 29 days ago • 159
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published about 1 month ago • 36
Large Language Models Align with the Human Brain during Creative Thinking Paper • 2604.03480 • Published Apr 3 • 6
Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models Paper • 2604.02315 • Published Apr 3 • 5
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 49
SimpleGPT: Improving GPT via A Simple Normalization Strategy Paper • 2602.01212 • Published Feb 1 • 3 • 6
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality Paper • 2602.14080 • Published Feb 15 • 21
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking Paper • 2602.16849 • Published Feb 18 • 7
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy Paper • 2602.17363 • Published Feb 19 • 8
Preliminary sonification of ENSO using traditional Javanese gamelan scales Paper • 2602.14560 • Published Feb 16 • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 11
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published Feb 12 • 7 • 3
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published Feb 12 • 7
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Paper • 2602.11543 • Published Feb 12 • 6 • 4
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Paper • 2602.11543 • Published Feb 12 • 6