Inconsistency on AIME 24 benchmark

#15

by Jung - opened 7 days ago

Jung

7 days ago

Hi there are inconsistency in AIME 2024 results for phi4-reasoning-plus

(Other numbers e.g. OmniMath and AIME25 are consistent)

Jung changed discussion title from Benchmark on AIME 24 to Inconsistencu on AIME 24 benchmark 7 days ago

Jung changed discussion title from Inconsistencu on AIME 24 benchmark to Inconsistency on AIME 24 benchmark 7 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment