Standard-format-preference-dataset - a RLHFlow Collection

RLHFlow 's Collections

Decision-Tree Reward Models

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

RLHFLow Reward Models

Standard-format-preference-dataset

updated Mar 2

We collect the open-source datasets and process them into the standard format.

RLHFlow/UltraFeedback-preference-standard

Viewer • Updated Apr 27, 2024 • 340k • 176 • 14
RLHFlow/Helpsteer-preference-standard

Viewer • Updated Apr 27, 2024 • 37.1k • 35 • 6
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 112 • 4
RLHFlow/Orca-distibalel-standard

Viewer • Updated Apr 28, 2024 • 6.93k • 37 • 1
RLHFlow/Capybara-distibalel-Filter-standard

Viewer • Updated Apr 28, 2024 • 14.8k • 43
RLHFlow/CodeUltraFeedback-standard

Viewer • Updated Apr 27, 2024 • 50.2k • 65 • 5
RLHFlow/UltraInteract-filtered-standard

Viewer • Updated Apr 28, 2024 • 162k • 617 • 2
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 21 • 3
RLHFlow/Argilla-Math-DPO-standard

Viewer • Updated Apr 30, 2024 • 2.42k • 19 • 3
RLHFlow/Prometheus2-preference-standard

Viewer • Updated May 5, 2024 • 200k • 35 • 2
RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 40
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 49 • 4