Inference Providers
Active filters: dpo, trl
HumanLLMs/Human-Like-LLama3-8B-Instruct
Text Generation
• 8B • Updated • 222
• 25
HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407
Text Generation
• 12B • Updated • 102
• • 27
lewtun/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 3
alignment-handbook/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 17
• 3
alignment-handbook/zephyr-7b-dpo-qlora
Updated • 9
• 9
amirali1985/gpt-neo-125m_hh_reward
Text Generation
• 0.1B • Updated • 75
lewtun/zephyr-7b-dpo-qlora
sambar/zephyr-7b-ipo-lora
Text Generation
• Updated • 2
nikkoyabut/merged_model_dpo
sambar/zephyr-7b-ipo-lora-5ep
Text Generation
• Updated • 1
alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo
Text Generation
• 1B • Updated • 4
• 2
Yaxin1992/mixtral-dpo-1000
adhi29/openhermes-mistral-dpo-gptq
Updated
Text Generation
• 1.03M • Updated • 5
ybelkada/test-tags-model-2
Text Generation
• 1.03M • Updated • 2
justinj92/dpoplatypus-phi2
Text Generation
• 3B • Updated lewtun/zephyr-7b-dpo-qlora-8e0975a
akashkumarbtc/openhermes-mistral-dpo-gptq
Updated
darshan8950/openhermes-mistral-dpo-gptq
Updated
ondevicellm/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 5
jdang/openhermes-mistral-dpo-gptq
Updated
winglian/zephyr-deita-dpo
winglian/zephyr-deita-kto
winglian/zephyr-deita-kto-3ep