Extending Reinforcement Learning for LLMs with Flow Environment
SII-Jhao Zhang
JingHaoZ
AI & ML interests
Large Reasoning Model, Unified Understanding and Generation in MLLM
Recent Activity
upvoted a paper about 18 hours ago
Self-Distilled RLVR upvoted a paper 6 days ago
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization