Update README.md
Browse files
README.md
CHANGED
@@ -252,6 +252,42 @@ from [mteb/scripts/task_selection/europe_tasks.csv at main 路 embeddings-benchma
|
|
252 |
- i.e. this gives 20-2=18 translation pair subsets between the 5 languages. -2 because Italian 鈫旓笌 German doesn鈥檛 exist.
|
253 |
- this is done because otherwise there are 250 translation pair subsets which are not as relevant (e.g. they contain Vietnamese 鈫旓笌 Portuguese)
|
254 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
255 |
|
256 |
## Bias, Risks, and Limitations
|
257 |
|
@@ -271,7 +307,7 @@ Use the code below to get started with the model.
|
|
271 |
|
272 |
[More Information Needed]
|
273 |
|
274 |
-
|
275 |
|
276 |
### Training Data
|
277 |
|
|
|
252 |
- i.e. this gives 20-2=18 translation pair subsets between the 5 languages. -2 because Italian 鈫旓笌 German doesn鈥檛 exist.
|
253 |
- this is done because otherwise there are 250 translation pair subsets which are not as relevant (e.g. they contain Vietnamese 鈫旓笌 Portuguese)
|
254 |
|
255 |
+
#### Europe by task
|
256 |
+
|
257 |
+
| Model Name | AmazonCounterfactualClassification | BUCC.v2 | DiaBlaBitextMining | MassiveScenarioClassification | NTREXBitextMining | STS17 | Average |
|
258 |
+
|-------------------------------------------------------|-------------------------------------:|----------:|---------------------:|--------------------------------:|--------------------:|---------:|----------:|
|
259 |
+
| luminous-base-symmetric | 0.710921 | 0.990569 | 0.85374 | 0.710148 | 0.971263 | 0.879475 | 0.852686 |
|
260 |
+
| Pharia-7b-2048-medi1-causal-weighted-adapter | 0.735118 | 0.984346 | 0.822481 | 0.749375 | 0.968538 | 0.852473 | 0.852055 |
|
261 |
+
| Pharia-1-Embedding-4608-control | 0.724946 | 0.991884 | 0.865101 | 0.755763 | 0.982374 | 0.876741 | 0.866135 |
|
262 |
+
| GritLM-7B | 0.766381 | 0.994298 | 0.864504 | 0.789334 | 0.984593 | 0.880716 | 0.879971 |
|
263 |
+
|
264 |
+
#### Europe by language
|
265 |
+
|
266 |
+
| Model Name | deu-Latn | eng-Latn | fra-Latn | por-Latn | ita-Latn | spa-Latn | Average |
|
267 |
+
|-------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|----------:|
|
268 |
+
| luminous-base-symmetric | 0.913887 | 0.90055 | 0.929288 | 0.927929 | 0.932836 | 0.93469 | 0.923197 |
|
269 |
+
| Pharia-7b-2048-medi1-causal-weighted-adapter | 0.914817 | 0.876927 | 0.918247 | 0.938783 | 0.92802 | 0.934084 | 0.91848 |
|
270 |
+
| Pharia-1-Embedding-4608-control | 0.925309 | 0.902113 | 0.937961 | 0.953719 | 0.942352 | 0.945642 | 0.934516 |
|
271 |
+
| GritLM-7B | 0.934603 | 0.905669 | 0.942364 | 0.962042 | 0.949731 | 0.947428 | 0.940306 |
|
272 |
+
|
273 |
+
## Training Details
|
274 |
+
|
275 |
+
### Model architecture
|
276 |
+
|
277 |
+
|:-------:|:-------:|
|
278 |
+
|Number of layers|27|
|
279 |
+
|Number of attention heads|36|
|
280 |
+
|Head size|128|
|
281 |
+
|Number of Key-Value heads|4|
|
282 |
+
|Size hidden dimension|4608|
|
283 |
+
|MLP expansion factor|4|
|
284 |
+
|MLP type|Standard|
|
285 |
+
|Vocabulary size|128,000|
|
286 |
+
|Rotary base|1,000,000|
|
287 |
+
|Total parameter count|7,041,544,704|
|
288 |
+
|
289 |
+
|
290 |
+
|
291 |
|
292 |
## Bias, Risks, and Limitations
|
293 |
|
|
|
307 |
|
308 |
[More Information Needed]
|
309 |
|
310 |
+
|
311 |
|
312 |
### Training Data
|
313 |
|