Safetensors
llama
TildeSIA TBergmanis commited on
Commit
63ce6b3
·
verified ·
1 Parent(s): e3895ef

Update README.md (#2)

Browse files

- Update README.md (e26c640462de50cca1e321f40611815a28eaefbe)


Co-authored-by: Toms Bergmanis <[email protected]>

Files changed (1) hide show
  1. README.md +29 -30
README.md CHANGED
@@ -120,33 +120,32 @@ Character-level perplexity creates a standardised comparison by calculating how
120
  **What data did we use?**
121
  We use WMT24++ as it is a multilingual, language-parallel evaluation set that none of the models have seen during training. WMT24++ is a composite of texts from news, literature, speech, and social media; thus, it is suitable for foundational model benchmarking.
122
 
123
- | Language | TildeOpen-30B | Gemma-2-27B | EuroLLM-9B | ALIA-40B |
124
- |----------|---------------|-------------|------------|-----------------|
125
- | Bulgarian | **2.1716** | 2.3541 | 2.3502 | 2.2411 |
126
- | Croatian | **2.2259** | 2.6809 | 2.6780 | 2.3456 |
127
- | Czech | **2.2682** | 2.4873 | 2.4808 | 2.3639 |
128
- | Danish | **2.0968** | 2.2608 | 2.2586 | 2.1543 |
129
- | Dutch | **2.0136** | 2.1249 | 2.1185 | 2.0629 |
130
- | English | 2.1497 | **2.0342** | 2.1897 | 2.1027 |
131
- | Estonian | **2.2825** | 2.7163 | 2.5652 | 2.4232 |
132
- | Finnish | **2.1687** | 2.4069 | 2.3844 | 2.2774 |
133
- | French | 1.9779 | 2.0195 | 2.0479 | **1.9750** |
134
- | German | **1.9664** | 2.0214 | 2.0499 | 1.9725 |
135
- | Hungarian | **2.1481** | 2.3308 | 2.3705 | 2.2493 |
136
- | Icelandic | **2.2011** | 3.1917 | 5.3162 | 4.0978 |
137
- | Italian | **2.0431** | 2.1065 | 2.1213 | 2.0604 |
138
- | Latvian | **2.2477** | 2.6701 | 2.4896 | 2.4352 |
139
- | Lithuanian | **2.2301** | 2.5495 | 2.4754 | 2.4109 |
140
- | Norwegian | **2.2445** | 2.4173 | 2.5121 | 2.3152 |
141
- | Polish | **2.1214** | 2.2294 | 2.2264 | 2.1847 |
142
- | Portuguese | **2.0810** | 2.1554 | 2.1561 | 2.0884 |
143
- | Romanian | **2.1266** | 2.2724 | 2.2821 | 2.1974 |
144
- | Russian | **2.1502** | 2.2091 | 2.2813 | 2.1889 |
145
- | Serbian | **2.3708** | 2.8053 | 4.7160 | 2.5119 |
146
- | Slovak | **2.2281** | 2.4674 | 2.4588 | 2.3505 |
147
- | Slovenian | **2.2662** | 2.5798 | 2.5087 | 2.3611 |
148
- | Spanish | 2.0400 | 2.0665 | 2.1186 | **2.0055** |
149
- | Swedish | **2.1471** | 2.2971 | 2.2856 | 2.2039 |
150
- | Turkish | **2.2108** | 2.3665 | 2.3508 | 3.0611 |
151
- | Ukrainian | **2.2470** | 2.4000 | 2.4251 | 2.3168 |
152
-
 
120
  **What data did we use?**
121
  We use WMT24++ as it is a multilingual, language-parallel evaluation set that none of the models have seen during training. WMT24++ is a composite of texts from news, literature, speech, and social media; thus, it is suitable for foundational model benchmarking.
122
 
123
+ | Language | TildeOpen 30b | Gemma 2 27b | EuroLLM 22B Prev. | ALIA 40B |
124
+ |-----------------|---------|------------|----|------|
125
+ | Bulgarian | **2.0539** | 2.2184 | 2.1985 | 2.1336 |
126
+ | Czech | **2.1579** | 2.3522 | 2.3221 | 2.2719 |
127
+ | Danish | **2.003** | 2.1517 | 2.1353 | 2.0805 |
128
+ | German | **1.8769** | 1.9285 | 1.9452 | 1.904 |
129
+ | English | 2.0378 | **1.9525** | 2.0568 | 2.0261 |
130
+ | Spanish | 1.9503 | 1.9752 | 2.0145 | **1.9369** |
131
+ | Estonian | **2.1711** | 2.5747 | 2.3852 | 2.325 |
132
+ | Finnish | **2.0497** | 2.288 | 2.2388 | 2.1831 |
133
+ | French | **1.8978** | 1.9355 | 1.9282 | 1.9084 |
134
+ | Croatian | **2.1147** | 2.544 | 2.4905 | 2.2433 |
135
+ | Hungarian | **2.0539** | 2.2228 | 2.2256 | 2.1635 |
136
+ | Icelandic | **2.0873** | 3.0329 | 4.7908 | 3.957 |
137
+ | Italian | **1.9565** | 2.0137 | 2.0098 | 1.9887 |
138
+ | Lithuanian | **2.1247** | 2.4175 | 2.3137 | 2.3075 |
139
+ | Latvian | **2.1439** | 2.5355 | 2.3141 | 2.3276 |
140
+ | Dutch | **1.9333** | 2.0312 | 2.0079 | 1.9904 |
141
+ | Norwegian | **2.1284** | 2.2862 | 2.3506 | 2.2253 |
142
+ | Polish | **2.0241** | 2.1294 | 2.0803 | 2.0803 |
143
+ | Portuguese | **1.9899** | 2.0597 | 2.0272 | 2.0187 |
144
+ | Romanian | **2.0196** | 2.1606 | 2.1641 | 2.1114 |
145
+ | Russian | **2.0424** | 2.09 | 2.1095 | 2.0871 |
146
+ | Slovak | **2.1192** | 2.338 | 2.3029 | 2.2609 |
147
+ | Slovenian | **2.1556** | 2.4443 | 2.3398 | 2.2589 |
148
+ | Serbian | **2.2469** | 2.6351 | 4.2471 | 2.3743 |
149
+ | Swedish | **2.041** | 2.1809 | 2.1464 | 2.1211 |
150
+ | Turkish | **2.0997** | 2.247 | 2.2202 | 2.232 |
151
+ | Ukrainian | **2.1376** | 2.2665 | 2.2691 | 2.2086 |