open-llm-leaderboard/open_llm_leaderboard · is everything good with the benchmark?

May 21

I haven't seen model processing for days, and i wonder if there is anything wrong with the benchmark?

May 22

Well, I've patiently been waiting for 9 (!) days for Pantheon to finally get benchmarked but I don't get the feeling this is being maintained so much anymore.

saishf

May 22

Unfortunately, the HF research cluster is super full at the moment, here is the small announcement about it. We will evaluate your model, but it will take more time than usual.

The HF research cluster is super full at the moment, which means that evaluations on the open llm leaderboard will slow down 😴

However, if you feel like a recently released model is super important to have there for the community, open a discussion and we'll do our best!

I guess something big is being cooked up

clefourrier

Open LLM Leaderboard org May 23

Thanks @saishf for pointing out these resources! This is entirely correct, stay posted and you'll see something cool coming up in the other research teams :)

@hooking-dev thanks for your interest!
@Gryphe I'd like to point out that we evaluated about 1K models over the last month, so "not being maintained so much anymore" feel a bit unfair ^^"
Please take into account that evaluating a 70B takes 10h minimum, and that Hugging Face is GPU middle class, not GPU rich - we can't just increase the amount of GPUs on which to run evaluations suddenly. If you feel like your model deserves a manual evaluation because it's SOTA for example, please open a dedicated discussion.

Closing.

clefourrier changed discussion status to closed May 23