GiliGold commited on
Commit
41ef42d
ยท
verified ยท
1 Parent(s): f471414

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -56
README.md CHANGED
@@ -1,56 +1,66 @@
1
- ---
2
- pipeline_tag: sentence-similarity
3
- tags:
4
- - sentence-transformers
5
- - feature-extraction
6
- - sentence-similarity
7
-
8
- ---
9
-
10
- # {MODEL_NAME}
11
-
12
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
-
14
- <!--- Describe your model here -->
15
-
16
- ## Usage (Sentence-Transformers)
17
-
18
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
19
-
20
- ```
21
- pip install -U sentence-transformers
22
- ```
23
-
24
- Then you can use the model like this:
25
-
26
- ```python
27
- from sentence_transformers import SentenceTransformer
28
- sentences = ["This is an example sentence", "Each sentence is converted"]
29
-
30
- model = SentenceTransformer('{MODEL_NAME}')
31
- embeddings = model.encode(sentences)
32
- print(embeddings)
33
- ```
34
-
35
-
36
-
37
- ## Evaluation Results
38
-
39
- <!--- Describe how your model was evaluated -->
40
-
41
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
42
-
43
-
44
-
45
- ## Full Model Architecture
46
- ```
47
- SentenceTransformer(
48
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
49
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
50
- (2): Normalize()
51
- )
52
- ```
53
-
54
- ## Citing & Authors
55
-
56
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ datasets:
8
+ - HaifaCLGroup/KnessetCorpus
9
+ language:
10
+ - he
11
+ base_model:
12
+ - intfloat/multilingual-e5-large
13
+ ---
14
+
15
+ # Knesset-multi-e5-large
16
+
17
+ This is a [sentence-transformers](https://www.sbert.net) model. It maps sentences and paragraphs to a 1024-dimensional dense vector space and can be used for tasks like clustering or semantic search.
18
+
19
+ This model is based on the [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) model.
20
+ The transformer encoder has been fine-tuned on [Knesset data](https://huggingface.co/datasets/HaifaCLGroup/KnessetCorpus) to better capture legislative and parliamentary language.
21
+
22
+ ## Usage (Sentence-Transformers)
23
+
24
+ Using this model is straightforward if you have [sentence-transformers](https://www.sbert.net) installed:
25
+
26
+ ```bash
27
+ pip install -U sentence-transformers
28
+
29
+
30
+ Then you can use the model like this:
31
+
32
+ ```python
33
+ from sentence_transformers import SentenceTransformer
34
+ sentences = ["ื–ื” ืžืฉืคื˜ ืจืืฉื•ืŸ ืœื“ื•ื’ืžื”", "ื–ื” ื”ืžืฉืคื˜ ื”ืฉื ื™"]
35
+
36
+ model = SentenceTransformer('Knesset-multi-e5-large')
37
+ embeddings = model.encode(sentences)
38
+ print(embeddings)
39
+ ```
40
+
41
+
42
+
43
+ ## Evaluation Results
44
+
45
+ <!--- Describe how your model was evaluated -->
46
+
47
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
48
+
49
+
50
+
51
+ ## Full Model Architecture
52
+ ```
53
+ SentenceTransformer(
54
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
55
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
56
+ (2): Normalize()
57
+ )
58
+ ```
59
+ ## Additional Details
60
+ Base Model: intfloat/multilingual-e5-large
61
+ Fine-Tuning Data: Knesset data
62
+ Key Modifications:
63
+ The encoder part has been fine-tuned on Knesset data to enhance performance for tasks involving legislative and parliamentary content. The original pooling and normalization layers have been retained to ensure that the model's embeddings remain consistent with the architecture of the base model.
64
+ ## Citing & Authors
65
+ <!--- Describe where people can find more information -->
66
+ TBD