Papers
updated
Detecting Pretraining Data from Large Language Models
Paper
• 2310.16789
• Published • 11
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models
Paper
• 2310.13671
• Published • 19
AutoMix: Automatically Mixing Language Models
Paper
• 2310.12963
• Published • 14
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
• 2310.12962
• Published • 13
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
• 2310.10638
• Published • 30
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published • 123
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
• 2310.09520
• Published • 11
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
• 2310.03714
• Published • 37
Efficient Streaming Language Models with Attention Sinks
Paper
• 2309.17453
• Published • 14
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
• 2309.12284
• Published • 19
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
• 2309.11495
• Published • 40
Knowledge Distillation of Large Language Models
Paper
• 2306.08543
• Published • 22
A Repository of Conversational Datasets
Paper
• 1904.06472
• Published • 5
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Paper
• 1704.05179
• Published • 1
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084
• Published • 12
Efficient Few-Shot Learning Without Prompts
Paper
• 2209.11055
• Published • 4
Attention Is All You Need
Paper
• 1706.03762
• Published • 116
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
• 2005.11401
• Published • 14
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
• 2205.14135
• Published • 15
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
Language Variants
Paper
• 2308.16884
• Published • 10
Retentive Network: A Successor to Transformer for Large Language Models
Paper
• 2307.08621
• Published • 173
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper
• 2310.17752
• Published • 15
Contrastive Decoding: Open-ended Text Generation as Optimization
Paper
• 2210.15097
• Published
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
• 2309.09117
• Published • 40
Efficient Memory Management for Large Language Model Serving with
PagedAttention
Paper
• 2309.06180
• Published • 46
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
• 2309.03883
• Published • 36
Controlled Decoding from Language Models
Paper
• 2310.17022
• Published • 14
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
• 2309.08968
• Published • 24
Learning From Mistakes Makes LLM Better Reasoner
Paper
• 2310.20689
• Published • 29
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
• 2310.11511
• Published • 78
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published • 81
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like
Models at All Scales
Paper
• 2308.01320
• Published • 46
Shepherd: A Critic for Language Model Generation
Paper
• 2308.04592
• Published • 33
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
• 2305.13245
• Published • 6
Improving Large Language Model Fine-tuning for Solving Math Problems
Paper
• 2310.10047
• Published • 7
Dialogue Act Classification with Context-Aware Self-Attention
Paper
• 1904.02594
• Published
It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations
Paper
• 2005.04364
• Published
Question rewriting? Assessing its importance for conversational question
answering
Paper
• 2201.09146
• Published
Can Question Rewriting Help Conversational Question Answering?
Paper
• 2204.06239
• Published
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
• 2311.00430
• Published • 56
Paper
• 2310.20707
• Published • 11
When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale
Paper
• 2309.04564
• Published • 17
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper
• 2311.01282
• Published • 37
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper
• 2311.06243
• Published • 21
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper
• 2311.04934
• Published • 32
Co-training and Co-distillation for Quality Improvement and Compression
of Language Models
Paper
• 2311.02849
• Published • 8
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
• 2310.19820
• Published • 1
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
• 2312.11514
• Published • 262
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published • 48