rsajja commited on
Commit
860f7c8
·
verified ·
1 Parent(s): 16cf55e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +360 -356
README.md CHANGED
@@ -1,357 +1,361 @@
1
- ---
2
- base_model: sentence-transformers/all-MiniLM-L6-v2
3
- library_name: sentence-transformers
4
- pipeline_tag: sentence-similarity
5
- tags:
6
- - sentence-transformers
7
- - sentence-similarity
8
- - feature-extraction
9
- - generated_from_trainer
10
- - dataset_size:2244
11
- - loss:MultipleNegativesRankingLoss
12
- widget:
13
- - source_sentence: Participation in both lecture and discussion sections is required.
14
- sentences:
15
- - 'Service learning: 1.5 to 2 units depending on portfolio evaluation. Must meet
16
- 15 service hours for max credit.'
17
- - Students are expected to attend both lectures and discussion sessions for full
18
- participation credit.
19
- - 'General info mailbox: replies from Dr. Anil Goyal and Prof. Lucy Salgado.'
20
- - source_sentence: What is the name of the TA?
21
- sentences:
22
- - The instructors on record are Prof. Jae Sun Kim and Dr. Bea Valdez.
23
- - TAs include Priya Reddy and Lena Hoffmann.
24
- - 'The official hours: 4 credits.'
25
- - source_sentence: Who are the teaching staff?
26
- sentences:
27
- - 'Teaching staff: Dr. Doris Ren, Prof. David Sarpong, Prof. Olivia Boland.'
28
- - 'Teaching faculty: Dr. Malak Mahfouz and Prof. William Ruiz.'
29
- - 'Lab teaching assistants: Julio Mendez, Siri Eriksson.'
30
- - source_sentence: Who operates the Canvas FAQ as TA?
31
- sentences:
32
- - 'FAQ moderators: Pia Fenwick, Pilar Nasser. Email canvasfaq@school.edu.'
33
- - 'Designation: 3 credit hours.'
34
- - 'Lab: 1 hour (semester).'
35
- - source_sentence: What is the eligibility requirement to register for extra unit
36
- load?
37
- sentences:
38
- - Completion awards three credit hours.
39
- - 'Eligibility for extra: GPA >= 3.5 and at least 30 completed units.'
40
- - 'The academic credit hours: 2'
41
- ---
42
-
43
- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
44
-
45
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
46
-
47
- ## Model Details
48
-
49
- ### Model Description
50
- - **Model Type:** Sentence Transformer
51
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
52
- - **Maximum Sequence Length:** 256 tokens
53
- - **Output Dimensionality:** 384 dimensions
54
- - **Similarity Function:** Cosine Similarity
55
- <!-- - **Training Dataset:** Unknown -->
56
- <!-- - **Language:** Unknown -->
57
- <!-- - **License:** Unknown -->
58
-
59
- ### Model Sources
60
-
61
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
62
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
63
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
64
-
65
- ### Full Model Architecture
66
-
67
- ```
68
- SentenceTransformer(
69
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
70
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
71
- (2): Normalize()
72
- )
73
- ```
74
-
75
- ## Usage
76
-
77
- ### Direct Usage (Sentence Transformers)
78
-
79
- First install the Sentence Transformers library:
80
-
81
- ```bash
82
- pip install -U sentence-transformers
83
- ```
84
-
85
- Then you can load this model and run inference.
86
- ```python
87
- from sentence_transformers import SentenceTransformer
88
-
89
- # Download from the 🤗 Hub
90
- model = SentenceTransformer("rsajja/Fine-tuned-Educational-Model-MNRL")
91
- # Run inference
92
- sentences = [
93
- 'What is the eligibility requirement to register for extra unit load?',
94
- 'Eligibility for extra: GPA >= 3.5 and at least 30 completed units.',
95
- 'Completion awards three credit hours.',
96
- ]
97
- embeddings = model.encode(sentences)
98
- print(embeddings.shape)
99
- # [3, 384]
100
-
101
- # Get the similarity scores for the embeddings
102
- similarities = model.similarity(embeddings, embeddings)
103
- print(similarities.shape)
104
- # [3, 3]
105
- ```
106
-
107
- <!--
108
- ### Direct Usage (Transformers)
109
-
110
- <details><summary>Click to see the direct usage in Transformers</summary>
111
-
112
- </details>
113
- -->
114
-
115
- <!--
116
- ### Downstream Usage (Sentence Transformers)
117
-
118
- You can finetune this model on your own dataset.
119
-
120
- <details><summary>Click to expand</summary>
121
-
122
- </details>
123
- -->
124
-
125
- <!--
126
- ### Out-of-Scope Use
127
-
128
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
129
- -->
130
-
131
- <!--
132
- ## Bias, Risks and Limitations
133
-
134
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
135
- -->
136
-
137
- <!--
138
- ### Recommendations
139
-
140
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
141
- -->
142
-
143
- ## Training Details
144
-
145
- ### Training Dataset
146
-
147
- #### Unnamed Dataset
148
-
149
- * Size: 2,244 training samples
150
- * Columns: <code>sentence_0</code> and <code>sentence_1</code>
151
- * Approximate statistics based on the first 1000 samples:
152
- | | sentence_0 | sentence_1 |
153
- |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
154
- | type | string | string |
155
- | details | <ul><li>min: 4 tokens</li><li>mean: 9.79 tokens</li><li>max: 21 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 17.6 tokens</li><li>max: 98 tokens</li></ul> |
156
- * Samples:
157
- | sentence_0 | sentence_1 |
158
- |:--------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|
159
- | <code>Students with disabilities may request accommodations.</code> | <code>Accessibility services are available to students when needed.</code> |
160
- | <code>Who teaches the course?</code> | <code>This course is taught collaboratively by Dr. Louise McCann and Dr. Omar Franco.</code> |
161
- | <code>State the credit hour load for this section.</code> | <code>Credit load: 3.5 hours</code> |
162
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
163
- ```json
164
- {
165
- "scale": 20.0,
166
- "similarity_fct": "cos_sim"
167
- }
168
- ```
169
-
170
- ### Training Hyperparameters
171
- #### Non-Default Hyperparameters
172
-
173
- - `per_device_train_batch_size`: 64
174
- - `per_device_eval_batch_size`: 64
175
- - `num_train_epochs`: 25
176
- - `multi_dataset_batch_sampler`: round_robin
177
-
178
- #### All Hyperparameters
179
- <details><summary>Click to expand</summary>
180
-
181
- - `overwrite_output_dir`: False
182
- - `do_predict`: False
183
- - `eval_strategy`: no
184
- - `prediction_loss_only`: True
185
- - `per_device_train_batch_size`: 64
186
- - `per_device_eval_batch_size`: 64
187
- - `per_gpu_train_batch_size`: None
188
- - `per_gpu_eval_batch_size`: None
189
- - `gradient_accumulation_steps`: 1
190
- - `eval_accumulation_steps`: None
191
- - `torch_empty_cache_steps`: None
192
- - `learning_rate`: 5e-05
193
- - `weight_decay`: 0.0
194
- - `adam_beta1`: 0.9
195
- - `adam_beta2`: 0.999
196
- - `adam_epsilon`: 1e-08
197
- - `max_grad_norm`: 1
198
- - `num_train_epochs`: 25
199
- - `max_steps`: -1
200
- - `lr_scheduler_type`: linear
201
- - `lr_scheduler_kwargs`: {}
202
- - `warmup_ratio`: 0.0
203
- - `warmup_steps`: 0
204
- - `log_level`: passive
205
- - `log_level_replica`: warning
206
- - `log_on_each_node`: True
207
- - `logging_nan_inf_filter`: True
208
- - `save_safetensors`: True
209
- - `save_on_each_node`: False
210
- - `save_only_model`: False
211
- - `restore_callback_states_from_checkpoint`: False
212
- - `no_cuda`: False
213
- - `use_cpu`: False
214
- - `use_mps_device`: False
215
- - `seed`: 42
216
- - `data_seed`: None
217
- - `jit_mode_eval`: False
218
- - `use_ipex`: False
219
- - `bf16`: False
220
- - `fp16`: False
221
- - `fp16_opt_level`: O1
222
- - `half_precision_backend`: auto
223
- - `bf16_full_eval`: False
224
- - `fp16_full_eval`: False
225
- - `tf32`: None
226
- - `local_rank`: 0
227
- - `ddp_backend`: None
228
- - `tpu_num_cores`: None
229
- - `tpu_metrics_debug`: False
230
- - `debug`: []
231
- - `dataloader_drop_last`: False
232
- - `dataloader_num_workers`: 0
233
- - `dataloader_prefetch_factor`: None
234
- - `past_index`: -1
235
- - `disable_tqdm`: False
236
- - `remove_unused_columns`: True
237
- - `label_names`: None
238
- - `load_best_model_at_end`: False
239
- - `ignore_data_skip`: False
240
- - `fsdp`: []
241
- - `fsdp_min_num_params`: 0
242
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
243
- - `fsdp_transformer_layer_cls_to_wrap`: None
244
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
245
- - `deepspeed`: None
246
- - `label_smoothing_factor`: 0.0
247
- - `optim`: adamw_torch
248
- - `optim_args`: None
249
- - `adafactor`: False
250
- - `group_by_length`: False
251
- - `length_column_name`: length
252
- - `ddp_find_unused_parameters`: None
253
- - `ddp_bucket_cap_mb`: None
254
- - `ddp_broadcast_buffers`: False
255
- - `dataloader_pin_memory`: True
256
- - `dataloader_persistent_workers`: False
257
- - `skip_memory_metrics`: True
258
- - `use_legacy_prediction_loop`: False
259
- - `push_to_hub`: False
260
- - `resume_from_checkpoint`: None
261
- - `hub_model_id`: None
262
- - `hub_strategy`: every_save
263
- - `hub_private_repo`: False
264
- - `hub_always_push`: False
265
- - `gradient_checkpointing`: False
266
- - `gradient_checkpointing_kwargs`: None
267
- - `include_inputs_for_metrics`: False
268
- - `eval_do_concat_batches`: True
269
- - `fp16_backend`: auto
270
- - `push_to_hub_model_id`: None
271
- - `push_to_hub_organization`: None
272
- - `mp_parameters`:
273
- - `auto_find_batch_size`: False
274
- - `full_determinism`: False
275
- - `torchdynamo`: None
276
- - `ray_scope`: last
277
- - `ddp_timeout`: 1800
278
- - `torch_compile`: False
279
- - `torch_compile_backend`: None
280
- - `torch_compile_mode`: None
281
- - `dispatch_batches`: None
282
- - `split_batches`: None
283
- - `include_tokens_per_second`: False
284
- - `include_num_input_tokens_seen`: False
285
- - `neftune_noise_alpha`: None
286
- - `optim_target_modules`: None
287
- - `batch_eval_metrics`: False
288
- - `eval_on_start`: False
289
- - `use_liger_kernel`: False
290
- - `eval_use_gather_object`: False
291
- - `prompts`: None
292
- - `batch_sampler`: batch_sampler
293
- - `multi_dataset_batch_sampler`: round_robin
294
-
295
- </details>
296
-
297
- ### Training Logs
298
- | Epoch | Step | Training Loss |
299
- |:-------:|:----:|:-------------:|
300
- | 13.8889 | 500 | 0.5206 |
301
-
302
-
303
- ### Framework Versions
304
- - Python: 3.9.13
305
- - Sentence Transformers: 4.1.0
306
- - Transformers: 4.45.1
307
- - PyTorch: 2.0.1+cpu
308
- - Accelerate: 0.34.2
309
- - Datasets: 3.0.1
310
- - Tokenizers: 0.20.0
311
-
312
- ## Citation
313
-
314
- ### BibTeX
315
-
316
- #### Sentence Transformers
317
- ```bibtex
318
- @inproceedings{reimers-2019-sentence-bert,
319
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
320
- author = "Reimers, Nils and Gurevych, Iryna",
321
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
322
- month = "11",
323
- year = "2019",
324
- publisher = "Association for Computational Linguistics",
325
- url = "https://arxiv.org/abs/1908.10084",
326
- }
327
- ```
328
-
329
- #### MultipleNegativesRankingLoss
330
- ```bibtex
331
- @misc{henderson2017efficient,
332
- title={Efficient Natural Language Response Suggestion for Smart Reply},
333
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
334
- year={2017},
335
- eprint={1705.00652},
336
- archivePrefix={arXiv},
337
- primaryClass={cs.CL}
338
- }
339
- ```
340
-
341
- <!--
342
- ## Glossary
343
-
344
- *Clearly define terms in order to be accessible across audiences.*
345
- -->
346
-
347
- <!--
348
- ## Model Card Authors
349
-
350
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
351
- -->
352
-
353
- <!--
354
- ## Model Card Contact
355
-
356
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
357
  -->
 
1
+ ---
2
+ base_model: sentence-transformers/all-MiniLM-L6-v2
3
+ library_name: sentence-transformers
4
+ pipeline_tag: sentence-similarity
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:2244
11
+ - loss:MultipleNegativesRankingLoss
12
+ - Education
13
+ - Retrieval
14
+ - Syllabus
15
+ widget:
16
+ - source_sentence: Participation in both lecture and discussion sections is required.
17
+ sentences:
18
+ - >-
19
+ Service learning: 1.5 to 2 units depending on portfolio evaluation. Must
20
+ meet 15 service hours for max credit.
21
+ - >-
22
+ Students are expected to attend both lectures and discussion sessions for
23
+ full participation credit.
24
+ - 'General info mailbox: replies from Dr. Anil Goyal and Prof. Lucy Salgado.'
25
+ - source_sentence: What is the name of the TA?
26
+ sentences:
27
+ - The instructors on record are Prof. Jae Sun Kim and Dr. Bea Valdez.
28
+ - TAs include Priya Reddy and Lena Hoffmann.
29
+ - 'The official hours: 4 credits.'
30
+ - source_sentence: Who are the teaching staff?
31
+ sentences:
32
+ - 'Teaching staff: Dr. Doris Ren, Prof. David Sarpong, Prof. Olivia Boland.'
33
+ - 'Teaching faculty: Dr. Malak Mahfouz and Prof. William Ruiz.'
34
+ - 'Lab teaching assistants: Julio Mendez, Siri Eriksson.'
35
+ - source_sentence: Who operates the Canvas FAQ as TA?
36
+ sentences:
37
+ - 'FAQ moderators: Pia Fenwick, Pilar Nasser. Email [email protected].'
38
+ - 'Designation: 3 credit hours.'
39
+ - 'Lab: 1 hour (semester).'
40
+ - source_sentence: What is the eligibility requirement to register for extra unit load?
41
+ sentences:
42
+ - Completion awards three credit hours.
43
+ - 'Eligibility for extra: GPA >= 3.5 and at least 30 completed units.'
44
+ - 'The academic credit hours: 2'
45
+ ---
46
+
47
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
48
+
49
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
50
+
51
+ ## Model Details
52
+
53
+ ### Model Description
54
+ - **Model Type:** Sentence Transformer
55
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
56
+ - **Maximum Sequence Length:** 256 tokens
57
+ - **Output Dimensionality:** 384 dimensions
58
+ - **Similarity Function:** Cosine Similarity
59
+ <!-- - **Training Dataset:** Unknown -->
60
+ <!-- - **Language:** Unknown -->
61
+ <!-- - **License:** Unknown -->
62
+
63
+ ### Model Sources
64
+
65
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
66
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
67
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
68
+
69
+ ### Full Model Architecture
70
+
71
+ ```
72
+ SentenceTransformer(
73
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
74
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
75
+ (2): Normalize()
76
+ )
77
+ ```
78
+
79
+ ## Usage
80
+
81
+ ### Direct Usage (Sentence Transformers)
82
+
83
+ First install the Sentence Transformers library:
84
+
85
+ ```bash
86
+ pip install -U sentence-transformers
87
+ ```
88
+
89
+ Then you can load this model and run inference.
90
+ ```python
91
+ from sentence_transformers import SentenceTransformer
92
+
93
+ # Download from the 🤗 Hub
94
+ model = SentenceTransformer("rsajja/Fine-tuned-Educational-Model-MNRL")
95
+ # Run inference
96
+ sentences = [
97
+ 'What is the eligibility requirement to register for extra unit load?',
98
+ 'Eligibility for extra: GPA >= 3.5 and at least 30 completed units.',
99
+ 'Completion awards three credit hours.',
100
+ ]
101
+ embeddings = model.encode(sentences)
102
+ print(embeddings.shape)
103
+ # [3, 384]
104
+
105
+ # Get the similarity scores for the embeddings
106
+ similarities = model.similarity(embeddings, embeddings)
107
+ print(similarities.shape)
108
+ # [3, 3]
109
+ ```
110
+
111
+ <!--
112
+ ### Direct Usage (Transformers)
113
+
114
+ <details><summary>Click to see the direct usage in Transformers</summary>
115
+
116
+ </details>
117
+ -->
118
+
119
+ <!--
120
+ ### Downstream Usage (Sentence Transformers)
121
+
122
+ You can finetune this model on your own dataset.
123
+
124
+ <details><summary>Click to expand</summary>
125
+
126
+ </details>
127
+ -->
128
+
129
+ <!--
130
+ ### Out-of-Scope Use
131
+
132
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
133
+ -->
134
+
135
+ <!--
136
+ ## Bias, Risks and Limitations
137
+
138
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
139
+ -->
140
+
141
+ <!--
142
+ ### Recommendations
143
+
144
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
145
+ -->
146
+
147
+ ## Training Details
148
+
149
+ ### Training Dataset
150
+
151
+ #### Unnamed Dataset
152
+
153
+ * Size: 2,244 training samples
154
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
155
+ * Approximate statistics based on the first 1000 samples:
156
+ | | sentence_0 | sentence_1 |
157
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
158
+ | type | string | string |
159
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.79 tokens</li><li>max: 21 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 17.6 tokens</li><li>max: 98 tokens</li></ul> |
160
+ * Samples:
161
+ | sentence_0 | sentence_1 |
162
+ |:--------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|
163
+ | <code>Students with disabilities may request accommodations.</code> | <code>Accessibility services are available to students when needed.</code> |
164
+ | <code>Who teaches the course?</code> | <code>This course is taught collaboratively by Dr. Louise McCann and Dr. Omar Franco.</code> |
165
+ | <code>State the credit hour load for this section.</code> | <code>Credit load: 3.5 hours</code> |
166
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
167
+ ```json
168
+ {
169
+ "scale": 20.0,
170
+ "similarity_fct": "cos_sim"
171
+ }
172
+ ```
173
+
174
+ ### Training Hyperparameters
175
+ #### Non-Default Hyperparameters
176
+
177
+ - `per_device_train_batch_size`: 64
178
+ - `per_device_eval_batch_size`: 64
179
+ - `num_train_epochs`: 25
180
+ - `multi_dataset_batch_sampler`: round_robin
181
+
182
+ #### All Hyperparameters
183
+ <details><summary>Click to expand</summary>
184
+
185
+ - `overwrite_output_dir`: False
186
+ - `do_predict`: False
187
+ - `eval_strategy`: no
188
+ - `prediction_loss_only`: True
189
+ - `per_device_train_batch_size`: 64
190
+ - `per_device_eval_batch_size`: 64
191
+ - `per_gpu_train_batch_size`: None
192
+ - `per_gpu_eval_batch_size`: None
193
+ - `gradient_accumulation_steps`: 1
194
+ - `eval_accumulation_steps`: None
195
+ - `torch_empty_cache_steps`: None
196
+ - `learning_rate`: 5e-05
197
+ - `weight_decay`: 0.0
198
+ - `adam_beta1`: 0.9
199
+ - `adam_beta2`: 0.999
200
+ - `adam_epsilon`: 1e-08
201
+ - `max_grad_norm`: 1
202
+ - `num_train_epochs`: 25
203
+ - `max_steps`: -1
204
+ - `lr_scheduler_type`: linear
205
+ - `lr_scheduler_kwargs`: {}
206
+ - `warmup_ratio`: 0.0
207
+ - `warmup_steps`: 0
208
+ - `log_level`: passive
209
+ - `log_level_replica`: warning
210
+ - `log_on_each_node`: True
211
+ - `logging_nan_inf_filter`: True
212
+ - `save_safetensors`: True
213
+ - `save_on_each_node`: False
214
+ - `save_only_model`: False
215
+ - `restore_callback_states_from_checkpoint`: False
216
+ - `no_cuda`: False
217
+ - `use_cpu`: False
218
+ - `use_mps_device`: False
219
+ - `seed`: 42
220
+ - `data_seed`: None
221
+ - `jit_mode_eval`: False
222
+ - `use_ipex`: False
223
+ - `bf16`: False
224
+ - `fp16`: False
225
+ - `fp16_opt_level`: O1
226
+ - `half_precision_backend`: auto
227
+ - `bf16_full_eval`: False
228
+ - `fp16_full_eval`: False
229
+ - `tf32`: None
230
+ - `local_rank`: 0
231
+ - `ddp_backend`: None
232
+ - `tpu_num_cores`: None
233
+ - `tpu_metrics_debug`: False
234
+ - `debug`: []
235
+ - `dataloader_drop_last`: False
236
+ - `dataloader_num_workers`: 0
237
+ - `dataloader_prefetch_factor`: None
238
+ - `past_index`: -1
239
+ - `disable_tqdm`: False
240
+ - `remove_unused_columns`: True
241
+ - `label_names`: None
242
+ - `load_best_model_at_end`: False
243
+ - `ignore_data_skip`: False
244
+ - `fsdp`: []
245
+ - `fsdp_min_num_params`: 0
246
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
247
+ - `fsdp_transformer_layer_cls_to_wrap`: None
248
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
249
+ - `deepspeed`: None
250
+ - `label_smoothing_factor`: 0.0
251
+ - `optim`: adamw_torch
252
+ - `optim_args`: None
253
+ - `adafactor`: False
254
+ - `group_by_length`: False
255
+ - `length_column_name`: length
256
+ - `ddp_find_unused_parameters`: None
257
+ - `ddp_bucket_cap_mb`: None
258
+ - `ddp_broadcast_buffers`: False
259
+ - `dataloader_pin_memory`: True
260
+ - `dataloader_persistent_workers`: False
261
+ - `skip_memory_metrics`: True
262
+ - `use_legacy_prediction_loop`: False
263
+ - `push_to_hub`: False
264
+ - `resume_from_checkpoint`: None
265
+ - `hub_model_id`: None
266
+ - `hub_strategy`: every_save
267
+ - `hub_private_repo`: False
268
+ - `hub_always_push`: False
269
+ - `gradient_checkpointing`: False
270
+ - `gradient_checkpointing_kwargs`: None
271
+ - `include_inputs_for_metrics`: False
272
+ - `eval_do_concat_batches`: True
273
+ - `fp16_backend`: auto
274
+ - `push_to_hub_model_id`: None
275
+ - `push_to_hub_organization`: None
276
+ - `mp_parameters`:
277
+ - `auto_find_batch_size`: False
278
+ - `full_determinism`: False
279
+ - `torchdynamo`: None
280
+ - `ray_scope`: last
281
+ - `ddp_timeout`: 1800
282
+ - `torch_compile`: False
283
+ - `torch_compile_backend`: None
284
+ - `torch_compile_mode`: None
285
+ - `dispatch_batches`: None
286
+ - `split_batches`: None
287
+ - `include_tokens_per_second`: False
288
+ - `include_num_input_tokens_seen`: False
289
+ - `neftune_noise_alpha`: None
290
+ - `optim_target_modules`: None
291
+ - `batch_eval_metrics`: False
292
+ - `eval_on_start`: False
293
+ - `use_liger_kernel`: False
294
+ - `eval_use_gather_object`: False
295
+ - `prompts`: None
296
+ - `batch_sampler`: batch_sampler
297
+ - `multi_dataset_batch_sampler`: round_robin
298
+
299
+ </details>
300
+
301
+ ### Training Logs
302
+ | Epoch | Step | Training Loss |
303
+ |:-------:|:----:|:-------------:|
304
+ | 13.8889 | 500 | 0.5206 |
305
+
306
+
307
+ ### Framework Versions
308
+ - Python: 3.9.13
309
+ - Sentence Transformers: 4.1.0
310
+ - Transformers: 4.45.1
311
+ - PyTorch: 2.0.1+cpu
312
+ - Accelerate: 0.34.2
313
+ - Datasets: 3.0.1
314
+ - Tokenizers: 0.20.0
315
+
316
+ ## Citation
317
+
318
+ ### BibTeX
319
+
320
+ #### Sentence Transformers
321
+ ```bibtex
322
+ @inproceedings{reimers-2019-sentence-bert,
323
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
324
+ author = "Reimers, Nils and Gurevych, Iryna",
325
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
326
+ month = "11",
327
+ year = "2019",
328
+ publisher = "Association for Computational Linguistics",
329
+ url = "https://arxiv.org/abs/1908.10084",
330
+ }
331
+ ```
332
+
333
+ #### MultipleNegativesRankingLoss
334
+ ```bibtex
335
+ @misc{henderson2017efficient,
336
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
337
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
338
+ year={2017},
339
+ eprint={1705.00652},
340
+ archivePrefix={arXiv},
341
+ primaryClass={cs.CL}
342
+ }
343
+ ```
344
+
345
+ <!--
346
+ ## Glossary
347
+
348
+ *Clearly define terms in order to be accessible across audiences.*
349
+ -->
350
+
351
+ <!--
352
+ ## Model Card Authors
353
+
354
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
355
+ -->
356
+
357
+ <!--
358
+ ## Model Card Contact
359
+
360
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
361
  -->