gupta-tanish/llama-off-policy-qwq-10k-perturbation-iter1 Text Generation • 8B • Updated Jul 8, 2025 • 3
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-1.0-iteration2 Text Generation • 8B • Updated Jun 9, 2025 • 6
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-20.0-iteration1 Text Generation • 8B • Updated Jun 8, 2025 • 8
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-1.0-eos-increase-iteration2-lamda-0.1 Text Generation • 8B • Updated Jun 7, 2025 • 5
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-0.1-eos-increase-iteration2 Text Generation • 8B • Updated Jun 7, 2025 • 6
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.001-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 7, 2025 • 4
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.01-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 7, 2025 • 5
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.1-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 6, 2025 • 5
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-1.0-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 6, 2025 • 8
gupta-tanish/llama3-8b-instruct-on-policy-mpo-iteration2-v3 Text Generation • 8B • Updated Apr 24, 2025 • 5
gupta-tanish/llama3-8b-instruct-on-policy-mpo-iteration1-v3 Text Generation • 8B • Updated Apr 22, 2025 • 6
gupta-tanish/llama3-8b-instruct-on-policy-mpo-iteration1-v2 Text Generation • 8B • Updated Apr 20, 2025 • 5
gupta-tanish/llama3-8b-instruct-on-policy-mpo-iteration2 Text Generation • 8B • Updated Apr 17, 2025 • 5
gupta-tanish/llama3-8b-instruct-on-policy-mpo-iteration1 Text Generation • 8B • Updated Apr 17, 2025 • 8
gupta-tanish/mistral-instruct-v0.2-on-policy-mpo-iteration2 Text Generation • 7B • Updated Apr 16, 2025 • 6
gupta-tanish/mistral-instruct-v0.2-on-policy-mpo-iteration1 Text Generation • 7B • Updated Apr 15, 2025 • 6