Update README.md (#2)
Browse files- Update README.md (e26c640462de50cca1e321f40611815a28eaefbe)
Co-authored-by: Toms Bergmanis <[email protected]>
README.md
CHANGED
@@ -120,33 +120,32 @@ Character-level perplexity creates a standardised comparison by calculating how
|
|
120 |
**What data did we use?**
|
121 |
We use WMT24++ as it is a multilingual, language-parallel evaluation set that none of the models have seen during training. WMT24++ is a composite of texts from news, literature, speech, and social media; thus, it is suitable for foundational model benchmarking.
|
122 |
|
123 |
-
| Language | TildeOpen
|
124 |
-
|
125 |
-
| Bulgarian | **2.
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
129 |
-
|
|
130 |
-
|
|
131 |
-
| Estonian | **2.
|
132 |
-
| Finnish | **2.
|
133 |
-
| French | 1.
|
134 |
-
|
|
135 |
-
| Hungarian | **2.
|
136 |
-
| Icelandic | **2.
|
137 |
-
| Italian | **
|
138 |
-
|
|
139 |
-
|
|
140 |
-
|
|
141 |
-
|
|
142 |
-
|
|
143 |
-
|
|
144 |
-
|
|
145 |
-
|
|
146 |
-
| Slovak | **2.
|
147 |
-
| Slovenian | **2.
|
148 |
-
|
|
149 |
-
| Swedish | **2.
|
150 |
-
| Turkish | **2.
|
151 |
-
| Ukrainian | **2.
|
152 |
-
|
|
|
120 |
**What data did we use?**
|
121 |
We use WMT24++ as it is a multilingual, language-parallel evaluation set that none of the models have seen during training. WMT24++ is a composite of texts from news, literature, speech, and social media; thus, it is suitable for foundational model benchmarking.
|
122 |
|
123 |
+
| Language | TildeOpen 30b | Gemma 2 27b | EuroLLM 22B Prev. | ALIA 40B |
|
124 |
+
|-----------------|---------|------------|----|------|
|
125 |
+
| Bulgarian | **2.0539** | 2.2184 | 2.1985 | 2.1336 |
|
126 |
+
| Czech | **2.1579** | 2.3522 | 2.3221 | 2.2719 |
|
127 |
+
| Danish | **2.003** | 2.1517 | 2.1353 | 2.0805 |
|
128 |
+
| German | **1.8769** | 1.9285 | 1.9452 | 1.904 |
|
129 |
+
| English | 2.0378 | **1.9525** | 2.0568 | 2.0261 |
|
130 |
+
| Spanish | 1.9503 | 1.9752 | 2.0145 | **1.9369** |
|
131 |
+
| Estonian | **2.1711** | 2.5747 | 2.3852 | 2.325 |
|
132 |
+
| Finnish | **2.0497** | 2.288 | 2.2388 | 2.1831 |
|
133 |
+
| French | **1.8978** | 1.9355 | 1.9282 | 1.9084 |
|
134 |
+
| Croatian | **2.1147** | 2.544 | 2.4905 | 2.2433 |
|
135 |
+
| Hungarian | **2.0539** | 2.2228 | 2.2256 | 2.1635 |
|
136 |
+
| Icelandic | **2.0873** | 3.0329 | 4.7908 | 3.957 |
|
137 |
+
| Italian | **1.9565** | 2.0137 | 2.0098 | 1.9887 |
|
138 |
+
| Lithuanian | **2.1247** | 2.4175 | 2.3137 | 2.3075 |
|
139 |
+
| Latvian | **2.1439** | 2.5355 | 2.3141 | 2.3276 |
|
140 |
+
| Dutch | **1.9333** | 2.0312 | 2.0079 | 1.9904 |
|
141 |
+
| Norwegian | **2.1284** | 2.2862 | 2.3506 | 2.2253 |
|
142 |
+
| Polish | **2.0241** | 2.1294 | 2.0803 | 2.0803 |
|
143 |
+
| Portuguese | **1.9899** | 2.0597 | 2.0272 | 2.0187 |
|
144 |
+
| Romanian | **2.0196** | 2.1606 | 2.1641 | 2.1114 |
|
145 |
+
| Russian | **2.0424** | 2.09 | 2.1095 | 2.0871 |
|
146 |
+
| Slovak | **2.1192** | 2.338 | 2.3029 | 2.2609 |
|
147 |
+
| Slovenian | **2.1556** | 2.4443 | 2.3398 | 2.2589 |
|
148 |
+
| Serbian | **2.2469** | 2.6351 | 4.2471 | 2.3743 |
|
149 |
+
| Swedish | **2.041** | 2.1809 | 2.1464 | 2.1211 |
|
150 |
+
| Turkish | **2.0997** | 2.247 | 2.2202 | 2.232 |
|
151 |
+
| Ukrainian | **2.1376** | 2.2665 | 2.2691 | 2.2086 |
|
|