Spaces:
Running
on
CPU Upgrade
Need more model diversity
So, I've been waiting for benchmarks on other models such as RWKV raven 14b, and the small collection of other high performing non-llama models.
The leaderboard is unfortunately 95% llama-based, so, in the cases that there are non-llama models to benchmark, it would be best to set the testing priority of non-llama models higher
Yeah we're targeting this for the next batch of human / gpt4 evals.
As for which models are on the first tab, it's primarily driven by what users submit.
Is there an issue with running RWKV raven 14b? It seems to have been in the running state for something like three weeks now, and there still aren't any results for any rwkv variants. I'd guess that there must be some kind of configuration issue? I assume things are semi-paused though because of the blog post?
Anyway. Just wanted to ping that rwkv models are most likely stuck
(There still isn't an rwkv based model on the leaderboard)
Hi @spaceman7777 ! We released a very big update of the LLM leaderboard today, and we'll focus on going through the backlog of models (some have been stuck for quite a bit)
Thank you for your patience :)