The Unreasonable Ineffectiveness of the Deeper Layers
Paper
• 2403.17887
• Published
• 82
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published
• 107
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published
• 101
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
Better & Faster Large Language Models via Multi-token Prediction
Paper
• 2404.19737
• Published
• 81
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper
• 2405.09818
• Published
• 132
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
Series
Paper
• 2405.19327
• Published
• 48
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published
• 71
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper
• 2407.01370
• Published
• 89
Searching for Best Practices in Retrieval-Augmented Generation
Paper
• 2407.01219
• Published
• 11
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
• 2309.03883
• Published
• 36
Lynx: An Open Source Hallucination Evaluation Model
Paper
• 2407.08488
• Published
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
• 2408.03314
• Published
• 63
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
• 2408.14906
• Published
• 144
Human Feedback is not Gold Standard
Paper
• 2309.16349
• Published
• 5
Paper
• 2410.05258
• Published
• 180
Paper
• 2410.01201
• Published
• 53