Papers to Read
updated
mDPO: Conditional Preference Optimization for Multimodal Large Language
Models
Paper
• 2406.11839
• Published
• 40
Pandora: Towards General World Model with Natural Language Actions and
Video States
Paper
• 2406.09455
• Published
• 16
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper
• 2406.11827
• Published
• 17
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper
• 2406.11194
• Published
• 20
Breaking the Attention Bottleneck
Paper
• 2406.10906
• Published
• 4
Deep Bayesian Active Learning for Preference Modeling in Large Language
Models
Paper
• 2406.10023
• Published
• 2
RVT-2: Learning Precise Manipulation from Few Demonstrations
Paper
• 2406.08545
• Published
• 8
Paper
• 2406.09414
• Published
• 103
Transformers meet Neural Algorithmic Reasoners
Paper
• 2406.09308
• Published
• 44
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
Paper
• 2406.07522
• Published
• 40
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
• 2406.05338
• Published
• 41
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper
• 2406.06469
• Published
• 29
RePLan: Robotic Replanning with Perception and Language Models
Paper
• 2401.04157
• Published
• 3
Generative Expressive Robot Behaviors using Large Language Models
Paper
• 2401.14673
• Published
• 7
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Paper
• 2406.02900
• Published
• 13
PLaD: Preference-based Large Language Model Distillation with
Pseudo-Preference Pairs
Paper
• 2406.02886
• Published
• 10
Self-Improving Robust Preference Optimization
Paper
• 2406.01660
• Published
• 20
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
• 2405.20340
• Published
• 20
Offline Regularised Reinforcement Learning for Large Language Models
Alignment
Paper
• 2405.19107
• Published
• 15
Value-Incentivized Preference Optimization: A Unified Approach to Online
and Offline RLHF
Paper
• 2405.19320
• Published
• 10
An Introduction to Vision-Language Modeling
Paper
• 2405.17247
• Published
• 90
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published
• 41
Octo: An Open-Source Generalist Robot Policy
Paper
• 2405.12213
• Published
• 29
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Paper
• 2405.10315
• Published
• 14
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published
• 71
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published
• 28
Iterative Reasoning Preference Optimization
Paper
• 2404.19733
• Published
• 49
KAN: Kolmogorov-Arnold Networks
Paper
• 2404.19756
• Published
• 116
A Multimodal Automated Interpretability Agent
Paper
• 2404.14394
• Published
• 23
Learning H-Infinity Locomotion Control
Paper
• 2404.14405
• Published
• 7
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual
Alignment
Paper
• 2404.12318
• Published
• 15
Scaling Instructable Agents Across Many Simulated Worlds
Paper
• 2404.10179
• Published
• 28
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published
• 90
Dataset Reset Policy Optimization for RLHF
Paper
• 2404.08495
• Published
• 9
UniFL: Improve Stable Diffusion via Unified Feedback Learning
Paper
• 2404.05595
• Published
• 24
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
Robust Gaussian Splatting
Paper
• 2404.04211
• Published
• 9
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
• 2404.03673
• Published
• 15