-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2503.01933
-
Prompt-to-Leaderboard
Paper • 2502.14855 • Published • 7 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 33 -
Generating Skyline Datasets for Data Science Models
Paper • 2502.11262 • Published • 7 -
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge
Paper • 2502.12501 • Published • 6
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 30 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 515 -
Small Language Models: Survey, Measurements, and Insights
Paper • 2409.15790 • Published • 1 -
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
Paper • 2505.24189 • Published • 5
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 91 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 61 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 65
-
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 380 -
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
Paper • 2412.13649 • Published • 21 -
NeoBERT: A Next-Generation BERT
Paper • 2502.19587 • Published • 39
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 515 -
Small Language Models: Survey, Measurements, and Insights
Paper • 2409.15790 • Published • 1 -
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
Paper • 2505.24189 • Published • 5
-
Prompt-to-Leaderboard
Paper • 2502.14855 • Published • 7 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 33 -
Generating Skyline Datasets for Data Science Models
Paper • 2502.11262 • Published • 7 -
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge
Paper • 2502.12501 • Published • 6
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 91 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 61 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 65
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 30 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13
-
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 380 -
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
Paper • 2412.13649 • Published • 21 -
NeoBERT: A Next-Generation BERT
Paper • 2502.19587 • Published • 39