OpenEvals
community
AI & ML interests
LLM evaluation
Recent Activity
View all activity
Organization Card
Hi! Welcome on the org page of the Evaluation team at HuggingFace. We want to support the community in building and sharing quality evaluations, for reproducible and fair model comparisions, to cut through the hype of releases and better understand actual model capabilities.
We're behind the:
- lighteval LLM evaluation suite, fast and filled with the SOTA benchmarks you might want
- evaluation guidebook, your reference for LLM evals
- leaderboards on the hub initiative, to encourage people to build more leaderboards in the open for more reproducible evaluation. You'll find some doc here to build your own, and you can look for the best leaderboard for your use case here!
Our archived projects:
- Open LLM Leaderboard (over 11K models evaluated since 2023)
We're not behind the evaluate metrics guide but if you want to understand metrics better we really recommend checking it out!
Collections
5
This leaderboard has been evaluating LLMs from Jun 2024 on IFEval, MuSR, GPQA, MATH, BBH and MMLU-Pro
-
121
Open-LLM performances are plateauing, letβs make the leaderboard steep again
πUpdate leaderboard for fair model evaluation
-
13.2k
Open LLM Leaderboard
πTrack, rank and evaluate open LLMs and chatbots
-
open-llm-leaderboard/contents
Viewer β’ Updated β’ 4.58k β’ 9.33k β’ 17 -
open-llm-leaderboard/results
Preview β’ Updated β’ 37.6k β’ 15
This leaderboard has been evaluating LLMs from Jun 2024 on IFEval, MuSR, GPQA, MATH, BBH and MMLU-Pro
-
121
Open-LLM performances are plateauing, letβs make the leaderboard steep again
πUpdate leaderboard for fair model evaluation
-
13.2k
Open LLM Leaderboard
πTrack, rank and evaluate open LLMs and chatbots
-
open-llm-leaderboard/contents
Viewer β’ Updated β’ 4.58k β’ 9.33k β’ 17 -
open-llm-leaderboard/results
Preview β’ Updated β’ 37.6k β’ 15
models
0
None public yet
datasets
0
None public yet