Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated 24 days ago

Running on CPU Upgrade

181

181

MMLU Pro

🥇

More advanced and challenging multi-task evaluation
Running

36

36

Stick To Your Role! Leaderboard

🎭

Compare LLMs on role consistency across contexts
Running

50

50

ZeroEval Leaderboard

📊

Embed and use ZeroEval for evaluation tasks
Running

24

24

Decentralized Arena Leaderboard

🥇

Display model leaderboard evaluations
Running on CPU Upgrade

329

329

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running

175

175

GPU Poor LLM Arena

🏆

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

98

98

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

12.4k

12.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on Zero

282

282

TTS Spaces Arena

🤗

Blind vote on HF TTS models!