Aleph-Alpha
/

Pharia-1-Embedding-4608-control

Model card Files Files and versions Community

peralp24 commited on Dec 2, 2024

Commit

b38a29b

verified ·

1 Parent(s): d5e5c92

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -24

README.md CHANGED Viewed

@@ -27,6 +27,7 @@ in different languages. The finetuning was always performed using English instru
 - **Language(s) (NLP):** Trained on English, German, French, Spanish.
 <!--- **License:** [More Information Needed]-->
 <!--- **Finetuned from model [optional]:** [More Information Needed]-->
 ### Model Description
@@ -270,6 +271,19 @@ from [mteb/scripts/task_selection/europe_tasks.csv at main · embeddings-benchma
 | Pharia-1-Embedding-4608-control                       |   0.925309 |   0.902113 |   0.937961 |   0.953719 |   0.942352 |   0.945642 |  0.934516 |
 | GritLM-7B                                             |   0.934603 |   0.905669 |   0.942364 |   0.962042 |   0.949731 |   0.947428 |  0.940306 |
 ## Training Details
 ### Model architecture
@@ -320,27 +334,3 @@ therefore more cost-efficient at inference time.
 |fr|1.896|2.105|1.836|
 |es|1.673|2.030|1.749|
 |en|1.633|1.681|1.410|
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-**BibTeX:**
-[More Information Needed]
-**APA:**

 - **Language(s) (NLP):** Trained on English, German, French, Spanish.
 <!--- **License:** [More Information Needed]-->
 <!--- **Finetuned from model [optional]:** [More Information Needed]-->
+- **USP:** Model exhibits superior quality in pure cross-lingual tasks  for (German, English, French & Spanish pairings, see evaluation below)
 ### Model Description
 | Pharia-1-Embedding-4608-control                       |   0.925309 |   0.902113 |   0.937961 |   0.953719 |   0.942352 |   0.945642 |  0.934516 |
 | GritLM-7B                                             |   0.934603 |   0.905669 |   0.942364 |   0.962042 |   0.949731 |   0.947428 |  0.940306 |
+#### Evaluations on cross-lingual capabilities
+There are important use cases where one wants to retrieve multiple documents on a topic or answering questions that are formulated in a
+different language than the query. This increases recall and information retrieval coverage. For testing on cross-lingual capabilities
+evaluated Pharia-1-Embedding-4608-control and GritLM on the MLQA-V1 datasets (Facebook) for German/English and English/Spanish language pairings.
+For German/French we used the CLSD-WMT19 dataset providing correct and adversarial translations of a sentence in the corresponding pair language.
+|Model Name                     |MLQA-V1 Ger/Eng (2000 samples)| MLQA-V1 Eng/Esp (2000 samples)| CLSD-WMT19 (2900 samples)|
+|:-----------------------------:|:----------------------------:|:-----------------------------:|:------------------------:|
+|Pharia-1-Embedding-4608-control|79.5%                         |78.5%                          |95.1%                     |
+|GritLM-7B                      |73.4%                         |73.9%                          |94.2%                     |
 ## Training Details
 ### Model architecture
 |fr|1.896|2.105|1.836|
 |es|1.673|2.030|1.749|
 |en|1.633|1.681|1.410|