Introduction

Paper GitHub

MENTOR is a framework that enables LLMs to achieve effective and diverse exploration in reinforcement learning by providing expert guidance only at critical decision points, rather than imitating entire expert trajectories.

Key Highlights

  • Selective Expert Guidance: Injects expert signals only at critical decision points, avoiding full-trajectory imitation.
  • Effective & Diverse Exploration: Balances expert guidance with autonomous exploration, preventing entropy collapse.
  • Absorb Essence, Remove Redundancy: Captures essential expert strategies while discarding unnecessary patterns.

Chat Template

def build_MENTOR_chat_template(question, tokenizer):
    system_prompt = (
        "You are a helpful AI Assistant that provides well-reasoned and detailed responses. "
        "You FIRST think about the reasoning process as an internal monologue and "
        "then provide the final answer. The reasoning process MUST BE enclosed "
        "within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
    )
    return tokenizer.apply_chat_template(
        [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        tokenize=False,
        add_generation_prompt=True
    )

Citation

If you find our model useful, please kindly cite our paper:

@article{jiang2025selective,
  title={Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs},
  author={Jiang, Zishang and Han, Jinyi and Li, Tingyun and Wang, Xinyi and Jiang, Sihang and Liang, Jiaqing and Dai, Zhaoqian and Ma, Shuguang and Yu, Fei and Xiao, Yanghua},
  journal={arXiv preprint arXiv:2510.04140},
  year={2025}
}
Downloads last month
54
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jiangzs/MENTOR_Qwen_7B

Quantizations
1 model

Paper for Jiangzs/MENTOR_Qwen_7B