Aleph-Alpha
/

Pharia-1-Embedding-4608-control

Model card Files Files and versions Community

peralp24 commited on Nov 28, 2024

Commit

1076870

·

verified ·

1 Parent(s): 5f23e11

Update README.md

Files changed (1) hide show

README.md +13 -11

README.md CHANGED Viewed

@@ -186,17 +186,19 @@ Compare the similarity
 We leave the embeddings of the documents untouched and now obtain the following cosine similarities:
 Query vs. German TV show: ~0.632
 Query vs. Italian polymath: ~0.512
-These new cosine similarities imply that the ranking has indeed changed and the paragraph about the German TV show is now more relevant. This shows that instructions can help the model understand nuances in the data better and ultimately lead to embeddings that are more useful for your use-case.
-Tips on using the model
-First try and ideally evaluate the model on your data without instructions to see whether performance aligns with your expectations out-of-the-box
-If you decide to use an instruction with the aim of further boosting performance we suggest using this template as a guideline
-Template: Represent the [X] to find a [Y] that [describe how the X and Y relate]
-Examples
-Represent the newspaper paragraph to find a newspaper paragraph with the same topic
-Represent the sentence to find another sentence with the same meaning
-In cases where the two texts to compare are different in nature (e.g. query and document) – also called “asymmetric” – we suggest to first add an instruction to query texts only. Again, try and ideally evaluate the model in this setting. Then, if your aim is to further boost performance, we suggest that you add instructions to document texts as well where [X] and [Y] are flipped accordingly.

 We leave the embeddings of the documents untouched and now obtain the following cosine similarities:
 Query vs. German TV show: ~0.632
 Query vs. Italian polymath: ~0.512
+These new cosine similarities imply that the ranking has indeed changed and the paragraph about the German TV show is
+**now more relevant**. This shows that instructions can help the model understand nuances in the data better
+and ultimately lead to embeddings that are more useful for your use-case.
+#### Tips on using the model
+- First try and ideally evaluate the model on your data without instructions to see whether performance aligns with your expectations out-of-the-box
+- If you decide to use an instruction with the aim of further boosting performance we suggest using this template as a guideline
+  * ```Template: Represent the [X] to find a [Y] that [describe how the X and Y relate]```
+  * Examples
+    1. Represent the newspaper paragraph to find a newspaper paragraph with the same topic
+    2. Represent the sentence to find another sentence with the same meaning
+- In cases where the two texts to compare are different in nature (e.g. query and document) – also called “asymmetric” – we suggest to first add an instruction to query texts only. Again, try and ideally evaluate the model in this setting. Then, if your aim is to further boost performance, we suggest that you add instructions to document texts as well where [X] and [Y] are flipped accordingly.