Aleph-Alpha
/

Pharia-1-Embedding-4608-control

Model card Files Files and versions Community

peralp24 commited on Nov 28, 2024

Commit

3dc62e1

verified ·

1 Parent(s): 2d3b9ae

Update README.md

Browse files

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -252,6 +252,42 @@ from [mteb/scripts/task_selection/europe_tasks.csv at main · embeddings-benchma
   - i.e. this gives 20-2=18 translation pair subsets between the 5 languages. -2 because Italian ↔︎ German doesn’t exist.
   - this is done because otherwise there are 250 translation pair subsets which are not as relevant (e.g. they contain Vietnamese ↔︎ Portuguese)
 ## Bias, Risks, and Limitations
@@ -271,7 +307,7 @@ Use the code below to get started with the model.
 [More Information Needed]
-## Training Details
 ### Training Data

   - i.e. this gives 20-2=18 translation pair subsets between the 5 languages. -2 because Italian ↔︎ German doesn’t exist.
   - this is done because otherwise there are 250 translation pair subsets which are not as relevant (e.g. they contain Vietnamese ↔︎ Portuguese)
+#### Europe by task
+| Model Name                                            |   AmazonCounterfactualClassification |   BUCC.v2 |   DiaBlaBitextMining |   MassiveScenarioClassification |   NTREXBitextMining |    STS17 |   Average |
+|-------------------------------------------------------|-------------------------------------:|----------:|---------------------:|--------------------------------:|--------------------:|---------:|----------:|
+| luminous-base-symmetric                               |                             0.710921 |  0.990569 |             0.85374  |                        0.710148 |            0.971263 | 0.879475 |  0.852686 |
+| Pharia-7b-2048-medi1-causal-weighted-adapter          |                             0.735118 |  0.984346 |             0.822481 |                        0.749375 |            0.968538 | 0.852473 |  0.852055 |
+| Pharia-1-Embedding-4608-control                       |                             0.724946 |  0.991884 |             0.865101 |                        0.755763 |            0.982374 | 0.876741 |  0.866135 |
+| GritLM-7B                                             |                             0.766381 |  0.994298 |             0.864504 |                        0.789334 |            0.984593 | 0.880716 |  0.879971 |
+#### Europe by language
+| Model Name                                            |   deu-Latn |   eng-Latn |   fra-Latn |   por-Latn |   ita-Latn |   spa-Latn |   Average |
+|-------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|----------:|
+| luminous-base-symmetric                               |   0.913887 |   0.90055  |   0.929288 |   0.927929 |   0.932836 |   0.93469  |  0.923197 |
+| Pharia-7b-2048-medi1-causal-weighted-adapter          |   0.914817 |   0.876927 |   0.918247 |   0.938783 |   0.92802  |   0.934084 |  0.91848  |
+| Pharia-1-Embedding-4608-control                       |   0.925309 |   0.902113 |   0.937961 |   0.953719 |   0.942352 |   0.945642 |  0.934516 |
+| GritLM-7B                                             |   0.934603 |   0.905669 |   0.942364 |   0.962042 |   0.949731 |   0.947428 |  0.940306 |
+## Training Details
+### Model architecture
+|:-------:|:-------:|
+|Number of layers|27|
+|Number of attention heads|36|
+|Head size|128|
+|Number of Key-Value heads|4|
+|Size hidden dimension|4608|
+|MLP expansion factor|4|
+|MLP type|Standard|
+|Vocabulary size|128,000|
+|Rotary base|1,000,000|
+|Total parameter count|7,041,544,704|
 ## Bias, Risks, and Limitations
 [More Information Needed]
 ### Training Data