ML Theory The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 537 Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19 Why Language Models Hallucinate Paper • 2509.04664 • Published Sep 4 • 194
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 537
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19
Tokens Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23 • 8
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23 • 8
ML Theory The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 537 Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19 Why Language Models Hallucinate Paper • 2509.04664 • Published Sep 4 • 194
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 537
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19
Tokens Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23 • 8
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23 • 8