Best practice for QwQ-32B evaluation
#55
by
wangxingjun778
- opened
Best practice: https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html
EvalScope LLM Evaluation Framework: https://github.com/modelscope/evalscope
- Support “Overthinking” and "Underthinking" evaluation
- Support performance evaluation by math-level
Nice. how do you enforce the QwQ-32b to not overthinking?
It is estimated that optimization needs to be performed during the model training phase, such as designing a reward function specifically tailored to the difficulty of the problem and appropriately increasing penalty terms. Some tricks can also be referenced from this article: DAPO: an Open-Source LLM Reinforcement Learning System at Scale. https://arxiv.org/pdf/2503.14476
Nice. how do you enforce the QwQ-32b to not overthinking?