spacemanidol
commited on
Commit
•
236cea8
1
Parent(s):
55416e4
Update README.md
Browse files
README.md
CHANGED
@@ -2972,7 +2972,7 @@ Query: Where can I get the best tacos?
|
|
2972 |
### Using Huggingface transformers
|
2973 |
|
2974 |
|
2975 |
-
You can use the transformers package
|
2976 |
|
2977 |
|
2978 |
|
@@ -2995,14 +2995,14 @@ document_tokens = tokenizer(documents, padding=True, truncation=True, return_te
|
|
2995 |
# Compute token embeddings
|
2996 |
with torch.no_grad():
|
2997 |
query_embeddings = model(**query_tokens)[0][:, 0]
|
2998 |
-
|
2999 |
|
3000 |
|
3001 |
# normalize embeddings
|
3002 |
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
|
3003 |
-
|
3004 |
|
3005 |
-
scores = torch.mm(query_embeddings,
|
3006 |
for query, query_scores in zip(queries, scores):
|
3007 |
doc_score_pairs = list(zip(documents, query_scores))
|
3008 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
|
|
2972 |
### Using Huggingface transformers
|
2973 |
|
2974 |
|
2975 |
+
You can use the transformers package for a snowflake-arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).
|
2976 |
|
2977 |
|
2978 |
|
|
|
2995 |
# Compute token embeddings
|
2996 |
with torch.no_grad():
|
2997 |
query_embeddings = model(**query_tokens)[0][:, 0]
|
2998 |
+
document_embeddings = model(**document_tokens)[0][:, 0]
|
2999 |
|
3000 |
|
3001 |
# normalize embeddings
|
3002 |
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
|
3003 |
+
document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1)
|
3004 |
|
3005 |
+
scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1))
|
3006 |
for query, query_scores in zip(queries, scores):
|
3007 |
doc_score_pairs = list(zip(documents, query_scores))
|
3008 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|