AIDC-AI/Marco-o1 · More Benchmarks

PSM24

Nov 22, 2024

Can you add more benchmarks like MATH, MMLU, HumanEval, etc.?

Sniper

AIDC-AI org Nov 22, 2024

•

edited Nov 25, 2024

Thank you for your attention.

Our model has demonstrated preliminary feasibility in math-related tasks, primarily because data in this area is relatively easy to obtain, and the reward mechanisms are straightforward to design. However, our future focus will shift towards non-math tasks, particularly those involving open-ended questions. Therefore, we temporarily have no plans to evaluate the model on additional math benchmarks. However, we will soon present more experimental results and analyses on other tasks. Stay tuned for updates.

alpayariyak

Nov 22, 2024

In that case, why have the only reported benchmark be MGSM, a math benchmark?

Sniper

AIDC-AI org Nov 25, 2024

In that case, why have the only reported benchmark be MGSM, a math benchmark?

I apologize for any confusion caused by the incomplete response to the earlier question. The response has now been updated.

deleted

Nov 26, 2024

This comment has been hidden

Sniper changed discussion status to closed Nov 29, 2024