Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Shengyi Costa Huang's picture

Shengyi Costa Huang

vwxyzjn

alekan's profile picture

mehdikiani's profile picture

ezzaldeen's profile picture

·

http://costa.sh

vwxyzjn
vwxyzjn

AI & ML interests

None yet

Organizations

vwxyzjn 's collections 4

Async RLHF Paper Checkpoints

Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252

vwxyzjn/online_dpo_async

Updated Feb 5, 2025 • 1
vwxyzjn/online_dpo_sync

Updated Feb 5, 2025 • 2
vwxyzjn/ppo_async

Updated Feb 5, 2025 • 2
vwxyzjn/ppo_sync

Updated Feb 5, 2025 • 3

TL;DR summarization checkpoints

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 2.73k
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 1.82k
cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 4
cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 6

lm-human-preference-details

vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674

Text Generation • 0.1B • Updated Oct 4, 2023 • 5
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1

Text Generation • 0.1B • Updated Oct 4, 2023 • 6

RLOO / PPOv2 TL;DR summarize checkpoints

vwxyzjn/ppo_tldr

Text Generation • 1B • Updated May 24, 2024 • 10 • 1
vwxyzjn/ppo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 1
vwxyzjn/rloo_tldr

Text Generation • 1B • Updated Jun 11, 2024 • 8
vwxyzjn/rloo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024

Async RLHF Paper Checkpoints

Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252

vwxyzjn/online_dpo_async

Updated Feb 5, 2025 • 1
vwxyzjn/online_dpo_sync

Updated Feb 5, 2025 • 2
vwxyzjn/ppo_async

Updated Feb 5, 2025 • 2
vwxyzjn/ppo_sync

Updated Feb 5, 2025 • 3

lm-human-preference-details

vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674

Text Generation • 0.1B • Updated Oct 4, 2023 • 5
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1

Text Generation • 0.1B • Updated Oct 4, 2023 • 6

TL;DR summarization checkpoints

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 2.73k
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 1.82k
cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 4
cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 6

RLOO / PPOv2 TL;DR summarize checkpoints

vwxyzjn/ppo_tldr

Text Generation • 1B • Updated May 24, 2024 • 10 • 1
vwxyzjn/ppo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 1
vwxyzjn/rloo_tldr

Text Generation • 1B • Updated Jun 11, 2024 • 8
vwxyzjn/rloo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs