noystl commited on
Commit
955beff
·
verified ·
1 Parent(s): f30239c

Upload 11 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,644 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:784827
8
+ - loss:ContrastiveLoss
9
+ base_model: sentence-transformers/all-mpnet-base-v2
10
+ widget:
11
+ - source_sentence: 'Background: The study addresses the need for effective tools that
12
+ allow both novice and expert users to analyze the diversity of news coverage about
13
+ events. It highlights the importance of tailoring the interface to accommodate
14
+ non-expert users while also considering the insights of journalism-savvy users,
15
+ indicating a gap in existing systems that cater to varying levels of expertise
16
+ in news analysis.
17
+
18
+ Contribution: Combine ''a coordinated visualization interface tailored for visualization
19
+ non-expert users'' and '
20
+ sentences:
21
+ - a method considering lexical relationships
22
+ - cross-modality self-supervised learning via masked visual language modeling
23
+ - cognitive models of chaining
24
+ - source_sentence: 'Background: Existing methods for anomaly detection on dynamic
25
+ graphs struggle with capturing complex time information in graph structures and
26
+ generating effective negative samples for unsupervised learning. These challenges
27
+ highlight the need for improved methodologies that can address the limitations
28
+ of current approaches in this field.
29
+
30
+ Contribution: Combine ''a message-passing framework'' and '
31
+ sentences:
32
+ - the grouping task
33
+ - a forecaster
34
+ - the optimisation algorithm producing the learnable model
35
+ - source_sentence: 'Background: The accuracy of pixel flows is crucial for achieving
36
+ high-quality video enhancement, yet most prior works focus on estimating dense
37
+ flows that are generally less robust and computationally expensive. This highlights
38
+ a gap in existing methodologies that fail to prioritize accuracy over density,
39
+ necessitating a more efficient approach to flow estimation for video enhancement
40
+ tasks.
41
+
42
+ Contribution: Combine ''sparse point cloud data'' and '
43
+ sentences:
44
+ - a deep CNN
45
+ - a reinforcement learning view of the dialog generation task
46
+ - graphical models
47
+ - source_sentence: 'Background: The optimal robot assembly planning problem is challenging
48
+ due to the necessity of finding the optimal solution amongst an exponentially
49
+ vast number of possible plans while satisfying a selection of constraints. Traditional
50
+ heuristic methods are limited as they are specific to a given objective structure
51
+ or set of problem parameters, indicating a need for more versatile and effective
52
+ approaches.
53
+
54
+ Contribution: ''pos[e] assembly sequencing'' inspired by '
55
+ sentences:
56
+ - 3D geometric neural field representation
57
+ - prompt learning
58
+ - gestures
59
+ - source_sentence: 'Background: Patients find it difficult to use dexterous prosthetic
60
+ hands without a suitable control system, highlighting a need for improved grasp
61
+ performance and ease of operation. Existing methods may not adequately address
62
+ the challenges faced by users, particularly those with inferior myoelectric signals,
63
+ in effectively controlling prosthetic devices.
64
+
65
+ Contribution: Combine ''myoelectric signal'' and '
66
+ sentences:
67
+ - a unified framework for collaborative decoding between large and small language
68
+ models (Large Language Models and small language models)
69
+ - image understanding
70
+ - joint biomedical entity linking and event extraction
71
+ pipeline_tag: sentence-similarity
72
+ library_name: sentence-transformers
73
+ ---
74
+
75
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
76
+
77
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
78
+
79
+ ## Model Details
80
+
81
+ ### Model Description
82
+ - **Model Type:** Sentence Transformer
83
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 9a3225965996d404b775526de6dbfe85d3368642 -->
84
+ - **Maximum Sequence Length:** 384 tokens
85
+ - **Output Dimensionality:** 768 dimensions
86
+ - **Similarity Function:** Cosine Similarity
87
+ <!-- - **Training Dataset:** Unknown -->
88
+ <!-- - **Language:** Unknown -->
89
+ <!-- - **License:** Unknown -->
90
+
91
+ ### Model Sources
92
+
93
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
94
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
95
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
96
+
97
+ ### Full Model Architecture
98
+
99
+ ```
100
+ SentenceTransformer(
101
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
102
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
103
+ (2): Normalize()
104
+ )
105
+ ```
106
+
107
+ ## Usage
108
+
109
+ ### Direct Usage (Sentence Transformers)
110
+
111
+ First install the Sentence Transformers library:
112
+
113
+ ```bash
114
+ pip install -U sentence-transformers
115
+ ```
116
+
117
+ Then you can load this model and run inference.
118
+ ```python
119
+ from sentence_transformers import SentenceTransformer
120
+
121
+ # Download from the 🤗 Hub
122
+ model = SentenceTransformer("sentence_transformers_model_id")
123
+ # Run inference
124
+ sentences = [
125
+ "Background: Patients find it difficult to use dexterous prosthetic hands without a suitable control system, highlighting a need for improved grasp performance and ease of operation. Existing methods may not adequately address the challenges faced by users, particularly those with inferior myoelectric signals, in effectively controlling prosthetic devices.\nContribution: Combine 'myoelectric signal' and ",
126
+ 'a unified framework for collaborative decoding between large and small language models (Large Language Models and small language models)',
127
+ 'joint biomedical entity linking and event extraction',
128
+ ]
129
+ embeddings = model.encode(sentences)
130
+ print(embeddings.shape)
131
+ # [3, 768]
132
+
133
+ # Get the similarity scores for the embeddings
134
+ similarities = model.similarity(embeddings, embeddings)
135
+ print(similarities.shape)
136
+ # [3, 3]
137
+ ```
138
+
139
+ <!--
140
+ ### Direct Usage (Transformers)
141
+
142
+ <details><summary>Click to see the direct usage in Transformers</summary>
143
+
144
+ </details>
145
+ -->
146
+
147
+ <!--
148
+ ### Downstream Usage (Sentence Transformers)
149
+
150
+ You can finetune this model on your own dataset.
151
+
152
+ <details><summary>Click to expand</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Out-of-Scope Use
159
+
160
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
161
+ -->
162
+
163
+ <!--
164
+ ## Bias, Risks and Limitations
165
+
166
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
167
+ -->
168
+
169
+ <!--
170
+ ### Recommendations
171
+
172
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
173
+ -->
174
+
175
+ ## Training Details
176
+
177
+ ### Training Dataset
178
+
179
+ #### Unnamed Dataset
180
+
181
+
182
+ * Size: 784,827 training samples
183
+ * Columns: <code>query</code>, <code>answer</code>, and <code>label</code>
184
+ * Approximate statistics based on the first 1000 samples:
185
+ | | query | answer | label |
186
+ |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-----------------------------------------------|
187
+ | type | string | string | int |
188
+ | details | <ul><li>min: 60 tokens</li><li>mean: 77.86 tokens</li><li>max: 93 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 8.82 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>0: ~96.70%</li><li>1: ~3.30%</li></ul> |
189
+ * Samples:
190
+ | query | answer | label |
191
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------|:---------------|
192
+ | <code>Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.<br>Contribution: Combine 'a Hidden Markov Model' and </code> | <code>a multilayer perceptron</code> | <code>1</code> |
193
+ | <code>Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.<br>Contribution: Combine 'a Hidden Markov Model' and </code> | <code>synthetic occlusion augmentation during training</code> | <code>0</code> |
194
+ | <code>Background: The study addresses the challenge of action segmentation under weak supervision, where the available ground truth only indicates the presence of actions without providing their temporal ordering or occurrence timing in training videos. This limitation necessitates the development of a method to generate pseudo-ground truth for effective training and improve performance in action segmentation and alignment tasks.<br>Contribution: Combine 'a Hidden Markov Model' and </code> | <code>robustness of deep learning methods</code> | <code>0</code> |
195
+ * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
196
+ ```json
197
+ {
198
+ "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
199
+ "margin": 0.5,
200
+ "size_average": true
201
+ }
202
+ ```
203
+
204
+ ### Training Hyperparameters
205
+ #### Non-Default Hyperparameters
206
+
207
+ - `per_device_train_batch_size`: 64
208
+ - `learning_rate`: 1.9218937402834593e-05
209
+ - `num_train_epochs`: 2
210
+ - `warmup_ratio`: 0.08278167292320517
211
+ - `bf16`: True
212
+ - `batch_sampler`: no_duplicates
213
+
214
+ #### All Hyperparameters
215
+ <details><summary>Click to expand</summary>
216
+
217
+ - `overwrite_output_dir`: False
218
+ - `do_predict`: False
219
+ - `eval_strategy`: no
220
+ - `prediction_loss_only`: True
221
+ - `per_device_train_batch_size`: 64
222
+ - `per_device_eval_batch_size`: 8
223
+ - `per_gpu_train_batch_size`: None
224
+ - `per_gpu_eval_batch_size`: None
225
+ - `gradient_accumulation_steps`: 1
226
+ - `eval_accumulation_steps`: None
227
+ - `torch_empty_cache_steps`: None
228
+ - `learning_rate`: 1.9218937402834593e-05
229
+ - `weight_decay`: 0.0
230
+ - `adam_beta1`: 0.9
231
+ - `adam_beta2`: 0.999
232
+ - `adam_epsilon`: 1e-08
233
+ - `max_grad_norm`: 1.0
234
+ - `num_train_epochs`: 2
235
+ - `max_steps`: -1
236
+ - `lr_scheduler_type`: linear
237
+ - `lr_scheduler_kwargs`: {}
238
+ - `warmup_ratio`: 0.08278167292320517
239
+ - `warmup_steps`: 0
240
+ - `log_level`: passive
241
+ - `log_level_replica`: warning
242
+ - `log_on_each_node`: True
243
+ - `logging_nan_inf_filter`: True
244
+ - `save_safetensors`: True
245
+ - `save_on_each_node`: False
246
+ - `save_only_model`: False
247
+ - `restore_callback_states_from_checkpoint`: False
248
+ - `no_cuda`: False
249
+ - `use_cpu`: False
250
+ - `use_mps_device`: False
251
+ - `seed`: 42
252
+ - `data_seed`: None
253
+ - `jit_mode_eval`: False
254
+ - `use_ipex`: False
255
+ - `bf16`: True
256
+ - `fp16`: False
257
+ - `fp16_opt_level`: O1
258
+ - `half_precision_backend`: auto
259
+ - `bf16_full_eval`: False
260
+ - `fp16_full_eval`: False
261
+ - `tf32`: None
262
+ - `local_rank`: 0
263
+ - `ddp_backend`: None
264
+ - `tpu_num_cores`: None
265
+ - `tpu_metrics_debug`: False
266
+ - `debug`: []
267
+ - `dataloader_drop_last`: False
268
+ - `dataloader_num_workers`: 0
269
+ - `dataloader_prefetch_factor`: None
270
+ - `past_index`: -1
271
+ - `disable_tqdm`: False
272
+ - `remove_unused_columns`: True
273
+ - `label_names`: None
274
+ - `load_best_model_at_end`: False
275
+ - `ignore_data_skip`: False
276
+ - `fsdp`: []
277
+ - `fsdp_min_num_params`: 0
278
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
279
+ - `fsdp_transformer_layer_cls_to_wrap`: None
280
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
281
+ - `deepspeed`: None
282
+ - `label_smoothing_factor`: 0.0
283
+ - `optim`: adamw_torch
284
+ - `optim_args`: None
285
+ - `adafactor`: False
286
+ - `group_by_length`: False
287
+ - `length_column_name`: length
288
+ - `ddp_find_unused_parameters`: None
289
+ - `ddp_bucket_cap_mb`: None
290
+ - `ddp_broadcast_buffers`: False
291
+ - `dataloader_pin_memory`: True
292
+ - `dataloader_persistent_workers`: False
293
+ - `skip_memory_metrics`: True
294
+ - `use_legacy_prediction_loop`: False
295
+ - `push_to_hub`: False
296
+ - `resume_from_checkpoint`: None
297
+ - `hub_model_id`: None
298
+ - `hub_strategy`: every_save
299
+ - `hub_private_repo`: None
300
+ - `hub_always_push`: False
301
+ - `gradient_checkpointing`: False
302
+ - `gradient_checkpointing_kwargs`: None
303
+ - `include_inputs_for_metrics`: False
304
+ - `include_for_metrics`: []
305
+ - `eval_do_concat_batches`: True
306
+ - `fp16_backend`: auto
307
+ - `push_to_hub_model_id`: None
308
+ - `push_to_hub_organization`: None
309
+ - `mp_parameters`:
310
+ - `auto_find_batch_size`: False
311
+ - `full_determinism`: False
312
+ - `torchdynamo`: None
313
+ - `ray_scope`: last
314
+ - `ddp_timeout`: 1800
315
+ - `torch_compile`: False
316
+ - `torch_compile_backend`: None
317
+ - `torch_compile_mode`: None
318
+ - `dispatch_batches`: None
319
+ - `split_batches`: None
320
+ - `include_tokens_per_second`: False
321
+ - `include_num_input_tokens_seen`: False
322
+ - `neftune_noise_alpha`: None
323
+ - `optim_target_modules`: None
324
+ - `batch_eval_metrics`: False
325
+ - `eval_on_start`: False
326
+ - `use_liger_kernel`: False
327
+ - `eval_use_gather_object`: False
328
+ - `average_tokens_across_devices`: False
329
+ - `prompts`: None
330
+ - `batch_sampler`: no_duplicates
331
+ - `multi_dataset_batch_sampler`: proportional
332
+
333
+ </details>
334
+
335
+ ### Training Logs
336
+ <details><summary>Click to expand</summary>
337
+
338
+ | Epoch | Step | Training Loss |
339
+ |:------:|:-----:|:-------------:|
340
+ | 0.0082 | 100 | 0.0104 |
341
+ | 0.0163 | 200 | 0.0068 |
342
+ | 0.0245 | 300 | 0.005 |
343
+ | 0.0326 | 400 | 0.0041 |
344
+ | 0.0408 | 500 | 0.0054 |
345
+ | 0.0489 | 600 | 0.004 |
346
+ | 0.0571 | 700 | 0.0037 |
347
+ | 0.0652 | 800 | 0.0037 |
348
+ | 0.0734 | 900 | 0.0049 |
349
+ | 0.0815 | 1000 | 0.0038 |
350
+ | 0.0897 | 1100 | 0.004 |
351
+ | 0.0979 | 1200 | 0.0037 |
352
+ | 0.1060 | 1300 | 0.004 |
353
+ | 0.1142 | 1400 | 0.0049 |
354
+ | 0.1223 | 1500 | 0.0038 |
355
+ | 0.1305 | 1600 | 0.0036 |
356
+ | 0.1386 | 1700 | 0.0037 |
357
+ | 0.1468 | 1800 | 0.0045 |
358
+ | 0.1549 | 1900 | 0.0038 |
359
+ | 0.1631 | 2000 | 0.0034 |
360
+ | 0.1712 | 2100 | 0.0034 |
361
+ | 0.1794 | 2200 | 0.0035 |
362
+ | 0.1876 | 2300 | 0.0045 |
363
+ | 0.1957 | 2400 | 0.0036 |
364
+ | 0.2039 | 2500 | 0.0036 |
365
+ | 0.2120 | 2600 | 0.0033 |
366
+ | 0.2202 | 2700 | 0.004 |
367
+ | 0.2283 | 2800 | 0.0036 |
368
+ | 0.2365 | 2900 | 0.0033 |
369
+ | 0.2446 | 3000 | 0.0033 |
370
+ | 0.2528 | 3100 | 0.0037 |
371
+ | 0.2609 | 3200 | 0.0038 |
372
+ | 0.2691 | 3300 | 0.0033 |
373
+ | 0.2773 | 3400 | 0.0034 |
374
+ | 0.2854 | 3500 | 0.0033 |
375
+ | 0.2936 | 3600 | 0.0041 |
376
+ | 0.3017 | 3700 | 0.0033 |
377
+ | 0.3099 | 3800 | 0.0033 |
378
+ | 0.3180 | 3900 | 0.0032 |
379
+ | 0.3262 | 4000 | 0.004 |
380
+ | 0.3343 | 4100 | 0.0035 |
381
+ | 0.3425 | 4200 | 0.0031 |
382
+ | 0.3506 | 4300 | 0.0033 |
383
+ | 0.3588 | 4400 | 0.0033 |
384
+ | 0.3670 | 4500 | 0.0039 |
385
+ | 0.3751 | 4600 | 0.0032 |
386
+ | 0.3833 | 4700 | 0.0034 |
387
+ | 0.3914 | 4800 | 0.0031 |
388
+ | 0.3996 | 4900 | 0.004 |
389
+ | 0.4077 | 5000 | 0.0032 |
390
+ | 0.4159 | 5100 | 0.0031 |
391
+ | 0.4240 | 5200 | 0.0031 |
392
+ | 0.4322 | 5300 | 0.0032 |
393
+ | 0.4403 | 5400 | 0.0039 |
394
+ | 0.4485 | 5500 | 0.0031 |
395
+ | 0.4567 | 5600 | 0.003 |
396
+ | 0.4648 | 5700 | 0.0032 |
397
+ | 0.4730 | 5800 | 0.0038 |
398
+ | 0.4811 | 5900 | 0.0033 |
399
+ | 0.4893 | 6000 | 0.0031 |
400
+ | 0.4974 | 6100 | 0.0032 |
401
+ | 0.5056 | 6200 | 0.0033 |
402
+ | 0.5137 | 6300 | 0.0033 |
403
+ | 0.5219 | 6400 | 0.0032 |
404
+ | 0.5300 | 6500 | 0.0031 |
405
+ | 0.5382 | 6600 | 0.0032 |
406
+ | 0.5464 | 6700 | 0.0038 |
407
+ | 0.5545 | 6800 | 0.003 |
408
+ | 0.5627 | 6900 | 0.003 |
409
+ | 0.5708 | 7000 | 0.0029 |
410
+ | 0.5790 | 7100 | 0.0038 |
411
+ | 0.5871 | 7200 | 0.0032 |
412
+ | 0.5953 | 7300 | 0.0031 |
413
+ | 0.6034 | 7400 | 0.003 |
414
+ | 0.6116 | 7500 | 0.003 |
415
+ | 0.6198 | 7600 | 0.0039 |
416
+ | 0.6279 | 7700 | 0.0031 |
417
+ | 0.6361 | 7800 | 0.0031 |
418
+ | 0.6442 | 7900 | 0.0031 |
419
+ | 0.6524 | 8000 | 0.0039 |
420
+ | 0.6605 | 8100 | 0.003 |
421
+ | 0.6687 | 8200 | 0.003 |
422
+ | 0.6768 | 8300 | 0.003 |
423
+ | 0.6850 | 8400 | 0.0028 |
424
+ | 0.6931 | 8500 | 0.0035 |
425
+ | 0.7013 | 8600 | 0.0031 |
426
+ | 0.7095 | 8700 | 0.003 |
427
+ | 0.7176 | 8800 | 0.0026 |
428
+ | 0.7258 | 8900 | 0.0034 |
429
+ | 0.7339 | 9000 | 0.0033 |
430
+ | 0.7421 | 9100 | 0.003 |
431
+ | 0.7502 | 9200 | 0.0027 |
432
+ | 0.7584 | 9300 | 0.0029 |
433
+ | 0.7665 | 9400 | 0.0034 |
434
+ | 0.7747 | 9500 | 0.0029 |
435
+ | 0.7828 | 9600 | 0.0028 |
436
+ | 0.7910 | 9700 | 0.0027 |
437
+ | 0.7992 | 9800 | 0.0033 |
438
+ | 0.8073 | 9900 | 0.0031 |
439
+ | 0.8155 | 10000 | 0.0029 |
440
+ | 0.8236 | 10100 | 0.0028 |
441
+ | 0.8318 | 10200 | 0.0031 |
442
+ | 0.8399 | 10300 | 0.0031 |
443
+ | 0.8481 | 10400 | 0.003 |
444
+ | 0.8562 | 10500 | 0.0029 |
445
+ | 0.8644 | 10600 | 0.0028 |
446
+ | 0.8725 | 10700 | 0.0033 |
447
+ | 0.8807 | 10800 | 0.003 |
448
+ | 0.8889 | 10900 | 0.0029 |
449
+ | 0.8970 | 11000 | 0.0027 |
450
+ | 0.9052 | 11100 | 0.0033 |
451
+ | 0.9133 | 11200 | 0.0029 |
452
+ | 0.9215 | 11300 | 0.0029 |
453
+ | 0.9296 | 11400 | 0.0029 |
454
+ | 0.9378 | 11500 | 0.003 |
455
+ | 0.9459 | 11600 | 0.0034 |
456
+ | 0.9541 | 11700 | 0.0031 |
457
+ | 0.9622 | 11800 | 0.0027 |
458
+ | 0.9704 | 11900 | 0.0029 |
459
+ | 0.9786 | 12000 | 0.0034 |
460
+ | 0.9867 | 12100 | 0.0032 |
461
+ | 0.9949 | 12200 | 0.003 |
462
+ | 1.0030 | 12300 | 0.0032 |
463
+ | 1.0112 | 12400 | 0.0028 |
464
+ | 1.0193 | 12500 | 0.003 |
465
+ | 1.0275 | 12600 | 0.0027 |
466
+ | 1.0356 | 12700 | 0.0034 |
467
+ | 1.0438 | 12800 | 0.0029 |
468
+ | 1.0519 | 12900 | 0.0025 |
469
+ | 1.0601 | 13000 | 0.0028 |
470
+ | 1.0683 | 13100 | 0.0026 |
471
+ | 1.0764 | 13200 | 0.0035 |
472
+ | 1.0846 | 13300 | 0.0026 |
473
+ | 1.0927 | 13400 | 0.0028 |
474
+ | 1.1009 | 13500 | 0.0026 |
475
+ | 1.1090 | 13600 | 0.0034 |
476
+ | 1.1172 | 13700 | 0.0028 |
477
+ | 1.1253 | 13800 | 0.0027 |
478
+ | 1.1335 | 13900 | 0.0026 |
479
+ | 1.1416 | 14000 | 0.0031 |
480
+ | 1.1498 | 14100 | 0.0025 |
481
+ | 1.1580 | 14200 | 0.0025 |
482
+ | 1.1661 | 14300 | 0.0025 |
483
+ | 1.1743 | 14400 | 0.0024 |
484
+ | 1.1824 | 14500 | 0.0031 |
485
+ | 1.1906 | 14600 | 0.0025 |
486
+ | 1.1987 | 14700 | 0.0024 |
487
+ | 1.2069 | 14800 | 0.0025 |
488
+ | 1.2150 | 14900 | 0.0029 |
489
+ | 1.2232 | 15000 | 0.0025 |
490
+ | 1.2313 | 15100 | 0.0025 |
491
+ | 1.2395 | 15200 | 0.0023 |
492
+ | 1.2477 | 15300 | 0.0024 |
493
+ | 1.2558 | 15400 | 0.0029 |
494
+ | 1.2640 | 15500 | 0.0023 |
495
+ | 1.2721 | 15600 | 0.0023 |
496
+ | 1.2803 | 15700 | 0.0023 |
497
+ | 1.2884 | 15800 | 0.0032 |
498
+ | 1.2966 | 15900 | 0.0023 |
499
+ | 1.3047 | 16000 | 0.0023 |
500
+ | 1.3129 | 16100 | 0.0024 |
501
+ | 1.3210 | 16200 | 0.0025 |
502
+ | 1.3292 | 16300 | 0.0028 |
503
+ | 1.3374 | 16400 | 0.0023 |
504
+ | 1.3455 | 16500 | 0.0021 |
505
+ | 1.3537 | 16600 | 0.0023 |
506
+ | 1.3618 | 16700 | 0.0029 |
507
+ | 1.3700 | 16800 | 0.0023 |
508
+ | 1.3781 | 16900 | 0.0023 |
509
+ | 1.3863 | 17000 | 0.0025 |
510
+ | 1.3944 | 17100 | 0.0028 |
511
+ | 1.4026 | 17200 | 0.0023 |
512
+ | 1.4107 | 17300 | 0.0023 |
513
+ | 1.4189 | 17400 | 0.0023 |
514
+ | 1.4271 | 17500 | 0.0023 |
515
+ | 1.4352 | 17600 | 0.0029 |
516
+ | 1.4434 | 17700 | 0.0022 |
517
+ | 1.4515 | 17800 | 0.0022 |
518
+ | 1.4597 | 17900 | 0.0023 |
519
+ | 1.4678 | 18000 | 0.0026 |
520
+ | 1.4760 | 18100 | 0.0024 |
521
+ | 1.4841 | 18200 | 0.0023 |
522
+ | 1.4923 | 18300 | 0.0024 |
523
+ | 1.5004 | 18400 | 0.0024 |
524
+ | 1.5086 | 18500 | 0.0026 |
525
+ | 1.5168 | 18600 | 0.0022 |
526
+ | 1.5249 | 18700 | 0.0023 |
527
+ | 1.5331 | 18800 | 0.0023 |
528
+ | 1.5412 | 18900 | 0.003 |
529
+ | 1.5494 | 19000 | 0.002 |
530
+ | 1.5575 | 19100 | 0.0022 |
531
+ | 1.5657 | 19200 | 0.0023 |
532
+ | 1.5738 | 19300 | 0.0023 |
533
+ | 1.5820 | 19400 | 0.0028 |
534
+ | 1.5901 | 19500 | 0.0022 |
535
+ | 1.5983 | 19600 | 0.0023 |
536
+ | 1.6065 | 19700 | 0.0022 |
537
+ | 1.6146 | 19800 | 0.0028 |
538
+ | 1.6228 | 19900 | 0.0022 |
539
+ | 1.6309 | 20000 | 0.0023 |
540
+ | 1.6391 | 20100 | 0.0025 |
541
+ | 1.6472 | 20200 | 0.0028 |
542
+ | 1.6554 | 20300 | 0.0023 |
543
+ | 1.6635 | 20400 | 0.0021 |
544
+ | 1.6717 | 20500 | 0.0022 |
545
+ | 1.6798 | 20600 | 0.0022 |
546
+ | 1.6880 | 20700 | 0.0025 |
547
+ | 1.6962 | 20800 | 0.0024 |
548
+ | 1.7043 | 20900 | 0.0023 |
549
+ | 1.7125 | 21000 | 0.0021 |
550
+ | 1.7206 | 21100 | 0.0024 |
551
+ | 1.7288 | 21200 | 0.0024 |
552
+ | 1.7369 | 21300 | 0.0023 |
553
+ | 1.7451 | 21400 | 0.0022 |
554
+ | 1.7532 | 21500 | 0.0021 |
555
+ | 1.7614 | 21600 | 0.0025 |
556
+ | 1.7696 | 21700 | 0.0023 |
557
+ | 1.7777 | 21800 | 0.002 |
558
+ | 1.7859 | 21900 | 0.0022 |
559
+ | 1.7940 | 22000 | 0.0025 |
560
+ | 1.8022 | 22100 | 0.0022 |
561
+ | 1.8103 | 22200 | 0.0023 |
562
+ | 1.8185 | 22300 | 0.0022 |
563
+ | 1.8266 | 22400 | 0.0021 |
564
+ | 1.8348 | 22500 | 0.0025 |
565
+ | 1.8429 | 22600 | 0.0025 |
566
+ | 1.8511 | 22700 | 0.0022 |
567
+ | 1.8593 | 22800 | 0.0023 |
568
+ | 1.8674 | 22900 | 0.0026 |
569
+ | 1.8756 | 23000 | 0.0022 |
570
+ | 1.8837 | 23100 | 0.0022 |
571
+ | 1.8919 | 23200 | 0.0022 |
572
+ | 1.9000 | 23300 | 0.0024 |
573
+ | 1.9082 | 23400 | 0.0022 |
574
+ | 1.9163 | 23500 | 0.0022 |
575
+ | 1.9245 | 23600 | 0.0023 |
576
+ | 1.9326 | 23700 | 0.0023 |
577
+ | 1.9408 | 23800 | 0.0027 |
578
+ | 1.9490 | 23900 | 0.0023 |
579
+ | 1.9571 | 24000 | 0.0023 |
580
+ | 1.9653 | 24100 | 0.0022 |
581
+ | 1.9734 | 24200 | 0.0027 |
582
+ | 1.9816 | 24300 | 0.0025 |
583
+ | 1.9897 | 24400 | 0.0023 |
584
+ | 1.9979 | 24500 | 0.0025 |
585
+
586
+ </details>
587
+
588
+ ### Framework Versions
589
+ - Python: 3.11.2
590
+ - Sentence Transformers: 3.3.1
591
+ - Transformers: 4.49.0
592
+ - PyTorch: 2.5.1+cu124
593
+ - Accelerate: 1.0.1
594
+ - Datasets: 3.1.0
595
+ - Tokenizers: 0.21.0
596
+
597
+ ## Citation
598
+
599
+ ### BibTeX
600
+
601
+ #### Sentence Transformers
602
+ ```bibtex
603
+ @inproceedings{reimers-2019-sentence-bert,
604
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
605
+ author = "Reimers, Nils and Gurevych, Iryna",
606
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
607
+ month = "11",
608
+ year = "2019",
609
+ publisher = "Association for Computational Linguistics",
610
+ url = "https://arxiv.org/abs/1908.10084",
611
+ }
612
+ ```
613
+
614
+ #### ContrastiveLoss
615
+ ```bibtex
616
+ @inproceedings{hadsell2006dimensionality,
617
+ author={Hadsell, R. and Chopra, S. and LeCun, Y.},
618
+ booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
619
+ title={Dimensionality Reduction by Learning an Invariant Mapping},
620
+ year={2006},
621
+ volume={2},
622
+ number={},
623
+ pages={1735-1742},
624
+ doi={10.1109/CVPR.2006.100}
625
+ }
626
+ ```
627
+
628
+ <!--
629
+ ## Glossary
630
+
631
+ *Clearly define terms in order to be accessible across audiences.*
632
+ -->
633
+
634
+ <!--
635
+ ## Model Card Authors
636
+
637
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
638
+ -->
639
+
640
+ <!--
641
+ ## Model Card Contact
642
+
643
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
644
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.49.0",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bdeeeff8db3c0d7dbbbf8f3886638937aa8364363c4e1641c648739c25883412
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff