@etemiz on Hugging Face: "gpt-oss-120B scored 28 (one of the lowest) on AHA leaderboard. not very human…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Aug 15, 2025

Post

6289

gpt-oss-120B scored 28 (one of the lowest) on AHA leaderboard. not very human aligned model.

these kind of models are not really "free": they are costing you your freedom if you know what i mean.

drmcbride

Aug 16, 2025

where is a link to the aha leaderboard?

etemiz

Aug 16, 2025

https://huggingface.co/blog/etemiz/aha-leaderboard

phi0112358

Aug 16, 2025

And who else besides you has ever seen this mysterious leaderboard and the questions? Please stop confusing people with your unscientific hocus-pocus.

etemiz

Aug 16, 2025

i will send you some questions if you politely ask

phi0112358

Aug 16, 2025

And who else besides you has ever seen this mysterious leaderboard and the questions? Please stop confusing people with your unscientific hocus-pocus.

VizorZ0042

Aug 16, 2025

Great! I knew from the release that this model would perform poorly in these types of tasks, mainly due to its stricter censorship compared to other popular models (Llama4, Claude 3.5 and etc.)

etemiz

Aug 16, 2025

yes it censors more than others. about 1% of the time it didn't answer the question. there may be a correlation between censoring and scoring low in AHA.

In this post