AIME Evaluation result

by Shuaiqi - opened 22 days ago

22 days ago

Seems qwen3 used max output length 38,912 tokens for AIME’24 and AIME’25 evaluation
But Baichuan-M2-32B used 64K tokens

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment