peralp24 commited on
Commit
c0962b2
verified
1 Parent(s): 40a5d17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -10
README.md CHANGED
@@ -273,16 +273,44 @@ from [mteb/scripts/task_selection/europe_tasks.csv at main 路 embeddings-benchma
273
 
274
 
275
  #### Evaluations on cross-lingual capabilities
276
- There are important use cases where one wants to retrieve multiple documents on a topic or answering questions that are formulated in a
277
- different language than the query. This increases recall and information retrieval coverage. For testing on cross-lingual capabilities
278
- evaluated Pharia-1-Embedding-4608-control and GritLM on the MLQA-V1 datasets (Facebook) for German/English and English/Spanish language pairings.
279
- For German/French we used the CLSD-WMT19 dataset providing correct and adversarial translations of a sentence in the corresponding pair language.
280
-
281
- |Model Name |MLQA-V1 Ger/Eng (2000 samples)| MLQA-V1 Eng/Esp (2000 samples)| CLSD-WMT19 Ger/Fra (2900 samples)|
282
- |:-----------------------------:|:----------------------------:|:-----------------------------:|:--------------------------------:|
283
- |Pharia-1-Embedding-4608-control|79.5% |78.5% |95.1% |
284
- |GritLM-7B |73.4% |73.9% |94.2% |
285
- |Nvidia-Embed-v2 |69.4% |70.7% |93.4% |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
286
 
287
 
288
  ## Training Details
 
273
 
274
 
275
  #### Evaluations on cross-lingual capabilities
276
+ There are important use cases where one wants to retrieve multiple documents on a topic or answering questions that are formulated in a
277
+ different language than the query. This increases recall and information retrieval coverage. For testing on cross-lingual capabilities
278
+ we evaluated Pharia-1-Embedding-4608-control, GritLM and Nvidia-Embed-v2 on the MLQA-V1 datasets (Facebook) for German/English and
279
+ English/Spanish language pairings. For German/French we used the CLSD-WMT19 dataset providing correct and adversarial translations
280
+ of a sentence in the corresponding pair language. In order to check quality over a larger range of sample size we did the accuracy
281
+ computations for varying number of samples taken from the MLQA-V1 dataset. For the CLSD-WMT19 evaluation we employed the
282
+ full set of data (2900 samples available).
283
+
284
+
285
+ #### MLQA-V1 Ger/Eng cross-lingual accuracies for the considered models
286
+
287
+ |# of samples|Pharia4608|GritLM|Nvidia-Embed-v2|
288
+ |:---:|:---:|:---:|:---:|
289
+ |1000|86.0%|82.5%|77.0%|
290
+ |2000|79.5%|73.4%|69.4%|
291
+ |4000|65.3%|59.2%|56.0%|
292
+ |6000|54.3%|48.6%|45.6%|
293
+ |10000|38.6%|32.8%|32.8%|
294
+
295
+
296
+ #### MLQA-V1 Eng/Esp cross-lingual accuracies for the considered models
297
+
298
+ |# samples|Pharia4608|GritLM|NV-Embed-v2|
299
+ |:---:|:---:|:---:|:---:|
300
+ |1000|87.5%|82.0%|81.5%|
301
+ |2000|78.5%|73.9%|70.7%|
302
+ |4000|65.5%|59.3%|56.9%|
303
+ |6000|55.3%|49.2%|46.2%|
304
+ |10000|41.7%|35.5%|33.2%|
305
+
306
+ #### CLSD-WMT19 Ger/Fra (2900 samples) cross-lingual evaluation for the considered models
307
+
308
+
309
+ |Model Name | accuracy |
310
+ |:-----------------------------:|:--------------------------------:|
311
+ |Pharia-1-Embedding-4608-control|95.1% |
312
+ |GritLM-7B |94.2% |
313
+ |Nvidia-Embed-v2 |93.4% |
314
 
315
 
316
  ## Training Details