PSFT PSFT+RL models wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 8B • Updated Aug 12, 2025 wh-zhu/Qwen2.5-7B-Instruct-PSFT-1300 8B • Updated Jul 26, 2025 wh-zhu/Qwen2.5-7B-SFT-RL-DAPO-90 8B • Updated Aug 13, 2025 wh-zhu/Qwen2.5-7B-Instruct-SFT-700 8B • Updated Jul 26, 2025
Weak-to-Strong weak-to-strong trained models wh-zhu/OpenMath-nemotron-7B-WSPO 8B • Updated May 25, 2025 wh-zhu/DeepScaleR-7B-WSPO 8B • Updated Jun 10, 2025 • 1
Realigner-TrRa wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_2 2B • Updated Jun 17, 2025 wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_5 2B • Updated Jun 17, 2025 • 2 wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_10 2B • Updated Jun 17, 2025 wh-zhu/DeepSeek-R1-TrRa-iter1-1.5B-lambda_2 2B • Updated Jun 17, 2025
Realigner-InRa Collections of our realigned models wh-zhu/DeepSeek-R1-Distill-Qwen-1.5B-InRa Text Generation • 2B • Updated May 7, 2025 • 6 wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa 8B • Updated May 7, 2025 • 1
PSFT PSFT+RL models wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 8B • Updated Aug 12, 2025 wh-zhu/Qwen2.5-7B-Instruct-PSFT-1300 8B • Updated Jul 26, 2025 wh-zhu/Qwen2.5-7B-SFT-RL-DAPO-90 8B • Updated Aug 13, 2025 wh-zhu/Qwen2.5-7B-Instruct-SFT-700 8B • Updated Jul 26, 2025
Realigner-TrRa wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_2 2B • Updated Jun 17, 2025 wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_5 2B • Updated Jun 17, 2025 • 2 wh-zhu/DeepSeek-R1-TrRa-1.5B-lambda_10 2B • Updated Jun 17, 2025 wh-zhu/DeepSeek-R1-TrRa-iter1-1.5B-lambda_2 2B • Updated Jun 17, 2025
Weak-to-Strong weak-to-strong trained models wh-zhu/OpenMath-nemotron-7B-WSPO 8B • Updated May 25, 2025 wh-zhu/DeepScaleR-7B-WSPO 8B • Updated Jun 10, 2025 • 1
Realigner-InRa Collections of our realigned models wh-zhu/DeepSeek-R1-Distill-Qwen-1.5B-InRa Text Generation • 2B • Updated May 7, 2025 • 6 wh-zhu/DeepSeek-R1-Distill-Qwen-7B-InRa 8B • Updated May 7, 2025 • 1