Spaces:

CZLC
/

BenCzechMark

Running

mfajcik commited on Sep 5, 2024

Commit

715ec0a

1 Parent(s): c980420

Update content.py

Files changed (1) hide show

content.py CHANGED Viewed

@@ -95,7 +95,7 @@ We use the following tests, with varying statistical power:
 ### Duel Scoring Mechanism, Win Score
 On each task, each model is scored to each model (up to top-50 currently submitted models). For each model, record proportion of won duels: **Win Score**(WS).
-Next, the *Category Win Score**(CWS), is computed as an average over model's WSs in that category. Similarly, 🇨🇿 **BenCzechMark Win Score** is computed as model's average CWS across categories.
 The properties of this ranking mechanism include:
 - Ranking can change after every submission.
 - The across-task aggregation is interpretable: in words, it measures the average proportion of times the model is better.

 ### Duel Scoring Mechanism, Win Score
 On each task, each model is scored to each model (up to top-50 currently submitted models). For each model, record proportion of won duels: **Win Score**(WS).
+Next, the **Category Win Score**(CWS), is computed as an average over model's WSs in that category. Similarly, 🇨🇿 **BenCzechMark Win Score** is computed as model's average CWS across categories.
 The properties of this ranking mechanism include:
 - Ranking can change after every submission.
 - The across-task aggregation is interpretable: in words, it measures the average proportion of times the model is better.