Seems qwen3 used max output length 38,912 tokens for AIME’24 and AIME’25 evaluationBut Baichuan-M2-32B used 64K tokens
· Sign up or log in to comment