Best practice for QwQ-32B evaluation

#55

by wangxingjun778 - opened Mar 12

Mar 12

Best practice: https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html
EvalScope LLM Evaluation Framework: https://github.com/modelscope/evalscope

Support “Overthinking” and "Underthinking" evaluation
Support performance evaluation by math-level

wangxingjun778

Mar 12

And some conclusions as follows:

ltang-ai

Mar 25

Nice. how do you enforce the QwQ-32b to not overthinking?

wangxingjun778

Mar 26

It is estimated that optimization needs to be performed during the model training phase, such as designing a reward function specifically tailored to the difficulty of the problem and appropriately increasing penalty terms. Some tricks can also be referenced from this article: DAPO: an Open-Source LLM Reinforcement Learning System at Scale. https://arxiv.org/pdf/2503.14476

Nice. how do you enforce the QwQ-32b to not overthinking?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment