Davidsamuel101 commited on
Commit
1f41e11
·
verified ·
1 Parent(s): dfda5dc

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +514 -0
  2. config.json +36 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +58 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,514 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - generated_from_trainer
6
+ - dataset_size:23770
7
+ - loss:MultipleNegativesRankingLoss
8
+ base_model: cross-encoder/ms-marco-MiniLM-L12-v2
9
+ pipeline_tag: text-ranking
10
+ library_name: sentence-transformers
11
+ metrics:
12
+ - map
13
+ - mrr@5
14
+ - ndcg@5
15
+ model-index:
16
+ - name: CrossEncoder based on cross-encoder/ms-marco-MiniLM-L12-v2
17
+ results:
18
+ - task:
19
+ type: cross-encoder-reranking
20
+ name: Cross Encoder Reranking
21
+ dataset:
22
+ name: claims evidence dev
23
+ type: claims-evidence-dev
24
+ metrics:
25
+ - type: map
26
+ value: 0.9904
27
+ name: Map
28
+ - type: mrr@5
29
+ value: 1.0
30
+ name: Mrr@5
31
+ - type: ndcg@5
32
+ value: 0.9882
33
+ name: Ndcg@5
34
+ ---
35
+
36
+ # CrossEncoder based on cross-encoder/ms-marco-MiniLM-L12-v2
37
+
38
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
39
+
40
+ ## Model Details
41
+
42
+ ### Model Description
43
+ - **Model Type:** Cross Encoder
44
+ - **Base model:** [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2) <!-- at revision a34da8fab3ad458d48778dea3276ce729857efaf -->
45
+ - **Maximum Sequence Length:** 512 tokens
46
+ - **Number of Output Labels:** 1 label
47
+ <!-- - **Training Dataset:** Unknown -->
48
+ <!-- - **Language:** Unknown -->
49
+ <!-- - **License:** Unknown -->
50
+
51
+ ### Model Sources
52
+
53
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
54
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
55
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
56
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
57
+
58
+ ## Usage
59
+
60
+ ### Direct Usage (Sentence Transformers)
61
+
62
+ First install the Sentence Transformers library:
63
+
64
+ ```bash
65
+ pip install -U sentence-transformers
66
+ ```
67
+
68
+ Then you can load this model and run inference.
69
+ ```python
70
+ from sentence_transformers import CrossEncoder
71
+
72
+ # Download from the 🤗 Hub
73
+ model = CrossEncoder("Davidsamuel101/ft-ms-marco-MiniLM-L12-v2-claims-reranker-v2")
74
+ # Get scores for pairs of texts
75
+ pairs = [
76
+ ['Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.', 'At very high concentrations (100 times atmospheric concentration, or greater), carbon dioxide can be toxic to animal life, so raising the concentration to 10,000 ppm (1%) or higher for several hours will eliminate pests such as whiteflies and spider mites in a greenhouse.'],
77
+ ['Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.', 'Plants can grow as much as 50 percent faster in concentrations of 1,000 ppm CO 2 when compared with ambient conditions, though this assumes no change in climate and no limitation on other nutrients.'],
78
+ ['Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.', 'Higher carbon dioxide concentrations will favourably affect plant growth and demand for water.'],
79
+ ['Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.', "Carbon dioxide in the Earth's atmosphere is essential to life and to most of the planetary biosphere."],
80
+ ['Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.', 'Rennie 2009: "Claim 1: Anthropogenic CO2 can\'t be changing climate, because CO2 is only a trace gas in the atmosphere and the amount produced by humans is dwarfed by the amount from volcanoes and other natural sources.'],
81
+ ]
82
+ scores = model.predict(pairs)
83
+ print(scores.shape)
84
+ # (5,)
85
+
86
+ # Or rank different texts based on similarity to a single text
87
+ ranks = model.rank(
88
+ 'Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.',
89
+ [
90
+ 'At very high concentrations (100 times atmospheric concentration, or greater), carbon dioxide can be toxic to animal life, so raising the concentration to 10,000 ppm (1%) or higher for several hours will eliminate pests such as whiteflies and spider mites in a greenhouse.',
91
+ 'Plants can grow as much as 50 percent faster in concentrations of 1,000 ppm CO 2 when compared with ambient conditions, though this assumes no change in climate and no limitation on other nutrients.',
92
+ 'Higher carbon dioxide concentrations will favourably affect plant growth and demand for water.',
93
+ "Carbon dioxide in the Earth's atmosphere is essential to life and to most of the planetary biosphere.",
94
+ 'Rennie 2009: "Claim 1: Anthropogenic CO2 can\'t be changing climate, because CO2 is only a trace gas in the atmosphere and the amount produced by humans is dwarfed by the amount from volcanoes and other natural sources.',
95
+ ]
96
+ )
97
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
98
+ ```
99
+
100
+ <!--
101
+ ### Direct Usage (Transformers)
102
+
103
+ <details><summary>Click to see the direct usage in Transformers</summary>
104
+
105
+ </details>
106
+ -->
107
+
108
+ <!--
109
+ ### Downstream Usage (Sentence Transformers)
110
+
111
+ You can finetune this model on your own dataset.
112
+
113
+ <details><summary>Click to expand</summary>
114
+
115
+ </details>
116
+ -->
117
+
118
+ <!--
119
+ ### Out-of-Scope Use
120
+
121
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
122
+ -->
123
+
124
+ ## Evaluation
125
+
126
+ ### Metrics
127
+
128
+ #### Cross Encoder Reranking
129
+
130
+ * Dataset: `claims-evidence-dev`
131
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
132
+ ```json
133
+ {
134
+ "at_k": 5,
135
+ "always_rerank_positives": true
136
+ }
137
+ ```
138
+
139
+ | Metric | Value |
140
+ |:-----------|:---------------------|
141
+ | map | 0.9904 (-0.0096) |
142
+ | mrr@5 | 1.0000 (+0.0000) |
143
+ | **ndcg@5** | **0.9882 (-0.0118)** |
144
+
145
+ <!--
146
+ ## Bias, Risks and Limitations
147
+
148
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
149
+ -->
150
+
151
+ <!--
152
+ ### Recommendations
153
+
154
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
155
+ -->
156
+
157
+ ## Training Details
158
+
159
+ ### Training Dataset
160
+
161
+ #### Unnamed Dataset
162
+
163
+ * Size: 23,770 training samples
164
+ * Columns: <code>text1</code>, <code>text2</code>, and <code>label</code>
165
+ * Approximate statistics based on the first 1000 samples:
166
+ | | text1 | text2 | label |
167
+ |:--------|:-------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:------------------------------------------------|
168
+ | type | string | string | int |
169
+ | details | <ul><li>min: 38 characters</li><li>mean: 118.57 characters</li><li>max: 226 characters</li></ul> | <ul><li>min: 14 characters</li><li>mean: 144.96 characters</li><li>max: 1176 characters</li></ul> | <ul><li>0: ~83.70%</li><li>1: ~16.30%</li></ul> |
170
+ * Samples:
171
+ | text1 | text2 | label |
172
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
173
+ | <code>Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.</code> | <code>At very high concentrations (100 times atmospheric concentration, or greater), carbon dioxide can be toxic to animal life, so raising the concentration to 10,000 ppm (1%) or higher for several hours will eliminate pests such as whiteflies and spider mites in a greenhouse.</code> | <code>1</code> |
174
+ | <code>Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.</code> | <code>Plants can grow as much as 50 percent faster in concentrations of 1,000 ppm CO 2 when compared with ambient conditions, though this assumes no change in climate and no limitation on other nutrients.</code> | <code>1</code> |
175
+ | <code>Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life.</code> | <code>Higher carbon dioxide concentrations will favourably affect plant growth and demand for water.</code> | <code>1</code> |
176
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss) with these parameters:
177
+ ```json
178
+ {
179
+ "scale": 10.0,
180
+ "num_negatives": 4,
181
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
182
+ }
183
+ ```
184
+
185
+ ### Training Hyperparameters
186
+ #### Non-Default Hyperparameters
187
+
188
+ - `eval_strategy`: steps
189
+ - `per_device_train_batch_size`: 16
190
+ - `learning_rate`: 3e-06
191
+ - `num_train_epochs`: 5
192
+ - `bf16`: True
193
+ - `load_best_model_at_end`: True
194
+
195
+ #### All Hyperparameters
196
+ <details><summary>Click to expand</summary>
197
+
198
+ - `overwrite_output_dir`: False
199
+ - `do_predict`: False
200
+ - `eval_strategy`: steps
201
+ - `prediction_loss_only`: True
202
+ - `per_device_train_batch_size`: 16
203
+ - `per_device_eval_batch_size`: 8
204
+ - `per_gpu_train_batch_size`: None
205
+ - `per_gpu_eval_batch_size`: None
206
+ - `gradient_accumulation_steps`: 1
207
+ - `eval_accumulation_steps`: None
208
+ - `torch_empty_cache_steps`: None
209
+ - `learning_rate`: 3e-06
210
+ - `weight_decay`: 0.0
211
+ - `adam_beta1`: 0.9
212
+ - `adam_beta2`: 0.999
213
+ - `adam_epsilon`: 1e-08
214
+ - `max_grad_norm`: 1.0
215
+ - `num_train_epochs`: 5
216
+ - `max_steps`: -1
217
+ - `lr_scheduler_type`: linear
218
+ - `lr_scheduler_kwargs`: {}
219
+ - `warmup_ratio`: 0.0
220
+ - `warmup_steps`: 0
221
+ - `log_level`: passive
222
+ - `log_level_replica`: warning
223
+ - `log_on_each_node`: True
224
+ - `logging_nan_inf_filter`: True
225
+ - `save_safetensors`: True
226
+ - `save_on_each_node`: False
227
+ - `save_only_model`: False
228
+ - `restore_callback_states_from_checkpoint`: False
229
+ - `no_cuda`: False
230
+ - `use_cpu`: False
231
+ - `use_mps_device`: False
232
+ - `seed`: 42
233
+ - `data_seed`: None
234
+ - `jit_mode_eval`: False
235
+ - `use_ipex`: False
236
+ - `bf16`: True
237
+ - `fp16`: False
238
+ - `fp16_opt_level`: O1
239
+ - `half_precision_backend`: auto
240
+ - `bf16_full_eval`: False
241
+ - `fp16_full_eval`: False
242
+ - `tf32`: None
243
+ - `local_rank`: 0
244
+ - `ddp_backend`: None
245
+ - `tpu_num_cores`: None
246
+ - `tpu_metrics_debug`: False
247
+ - `debug`: []
248
+ - `dataloader_drop_last`: False
249
+ - `dataloader_num_workers`: 0
250
+ - `dataloader_prefetch_factor`: None
251
+ - `past_index`: -1
252
+ - `disable_tqdm`: False
253
+ - `remove_unused_columns`: True
254
+ - `label_names`: None
255
+ - `load_best_model_at_end`: True
256
+ - `ignore_data_skip`: False
257
+ - `fsdp`: []
258
+ - `fsdp_min_num_params`: 0
259
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
260
+ - `tp_size`: 0
261
+ - `fsdp_transformer_layer_cls_to_wrap`: None
262
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
263
+ - `deepspeed`: None
264
+ - `label_smoothing_factor`: 0.0
265
+ - `optim`: adamw_torch
266
+ - `optim_args`: None
267
+ - `adafactor`: False
268
+ - `group_by_length`: False
269
+ - `length_column_name`: length
270
+ - `ddp_find_unused_parameters`: None
271
+ - `ddp_bucket_cap_mb`: None
272
+ - `ddp_broadcast_buffers`: False
273
+ - `dataloader_pin_memory`: True
274
+ - `dataloader_persistent_workers`: False
275
+ - `skip_memory_metrics`: True
276
+ - `use_legacy_prediction_loop`: False
277
+ - `push_to_hub`: False
278
+ - `resume_from_checkpoint`: None
279
+ - `hub_model_id`: None
280
+ - `hub_strategy`: every_save
281
+ - `hub_private_repo`: None
282
+ - `hub_always_push`: False
283
+ - `gradient_checkpointing`: False
284
+ - `gradient_checkpointing_kwargs`: None
285
+ - `include_inputs_for_metrics`: False
286
+ - `include_for_metrics`: []
287
+ - `eval_do_concat_batches`: True
288
+ - `fp16_backend`: auto
289
+ - `push_to_hub_model_id`: None
290
+ - `push_to_hub_organization`: None
291
+ - `mp_parameters`:
292
+ - `auto_find_batch_size`: False
293
+ - `full_determinism`: False
294
+ - `torchdynamo`: None
295
+ - `ray_scope`: last
296
+ - `ddp_timeout`: 1800
297
+ - `torch_compile`: False
298
+ - `torch_compile_backend`: None
299
+ - `torch_compile_mode`: None
300
+ - `include_tokens_per_second`: False
301
+ - `include_num_input_tokens_seen`: False
302
+ - `neftune_noise_alpha`: None
303
+ - `optim_target_modules`: None
304
+ - `batch_eval_metrics`: False
305
+ - `eval_on_start`: False
306
+ - `use_liger_kernel`: False
307
+ - `eval_use_gather_object`: False
308
+ - `average_tokens_across_devices`: False
309
+ - `prompts`: None
310
+ - `batch_sampler`: batch_sampler
311
+ - `multi_dataset_batch_sampler`: proportional
312
+
313
+ </details>
314
+
315
+ ### Training Logs
316
+ <details><summary>Click to expand</summary>
317
+
318
+ | Epoch | Step | Training Loss | claims-evidence-dev_ndcg@5 |
319
+ |:----------:|:-------:|:-------------:|:--------------------------:|
320
+ | 0.0336 | 50 | 1.2496 | - |
321
+ | **0.0673** | **100** | **1.2605** | **0.9523 (-0.0477)** |
322
+ | 0.1009 | 150 | 1.1969 | - |
323
+ | 0.1346 | 200 | 1.2353 | 0.9529 (-0.0471) |
324
+ | 0.1682 | 250 | 1.2114 | - |
325
+ | 0.2019 | 300 | 1.1438 | 0.9551 (-0.0449) |
326
+ | 0.2355 | 350 | 1.2062 | - |
327
+ | 0.2692 | 400 | 1.1631 | 0.9568 (-0.0432) |
328
+ | 0.3028 | 450 | 1.115 | - |
329
+ | 0.3365 | 500 | 1.2029 | 0.9582 (-0.0418) |
330
+ | 0.3701 | 550 | 1.0615 | - |
331
+ | 0.4038 | 600 | 1.185 | 0.9649 (-0.0351) |
332
+ | 0.4374 | 650 | 1.0651 | - |
333
+ | 0.4711 | 700 | 1.0951 | 0.9682 (-0.0318) |
334
+ | 0.5047 | 750 | 1.1267 | - |
335
+ | 0.5384 | 800 | 1.0822 | 0.9727 (-0.0273) |
336
+ | 0.5720 | 850 | 1.0658 | - |
337
+ | 0.6057 | 900 | 1.0113 | 0.9785 (-0.0215) |
338
+ | 0.6393 | 950 | 1.0578 | - |
339
+ | 0.6729 | 1000 | 1.074 | 0.9829 (-0.0171) |
340
+ | 0.7066 | 1050 | 1.0287 | - |
341
+ | 0.7402 | 1100 | 0.9337 | 0.9873 (-0.0127) |
342
+ | 0.7739 | 1150 | 0.9798 | - |
343
+ | 0.8075 | 1200 | 0.9697 | 0.9899 (-0.0101) |
344
+ | 0.8412 | 1250 | 0.984 | - |
345
+ | 0.8748 | 1300 | 0.9913 | 0.9898 (-0.0102) |
346
+ | 0.9085 | 1350 | 1.0126 | - |
347
+ | 0.9421 | 1400 | 0.9458 | 0.9897 (-0.0103) |
348
+ | 0.9758 | 1450 | 0.9594 | - |
349
+ | 1.0094 | 1500 | 0.9798 | 0.9896 (-0.0104) |
350
+ | 1.0431 | 1550 | 0.9599 | - |
351
+ | 1.0767 | 1600 | 0.9485 | 0.9887 (-0.0113) |
352
+ | 1.1104 | 1650 | 0.9021 | - |
353
+ | 1.1440 | 1700 | 0.9778 | 0.9887 (-0.0113) |
354
+ | 1.1777 | 1750 | 0.9836 | - |
355
+ | 1.2113 | 1800 | 0.939 | 0.9912 (-0.0088) |
356
+ | 1.2450 | 1850 | 0.9476 | - |
357
+ | 1.2786 | 1900 | 0.964 | 0.9914 (-0.0086) |
358
+ | 1.3122 | 1950 | 0.9238 | - |
359
+ | 1.3459 | 2000 | 0.9811 | 0.9895 (-0.0105) |
360
+ | 1.3795 | 2050 | 0.905 | - |
361
+ | 1.4132 | 2100 | 0.8979 | 0.9896 (-0.0104) |
362
+ | 1.4468 | 2150 | 0.8998 | - |
363
+ | 1.4805 | 2200 | 0.9016 | 0.9896 (-0.0104) |
364
+ | 1.5141 | 2250 | 0.9183 | - |
365
+ | 1.5478 | 2300 | 0.8805 | 0.9896 (-0.0104) |
366
+ | 1.5814 | 2350 | 0.8672 | - |
367
+ | 1.6151 | 2400 | 0.8822 | 0.9896 (-0.0104) |
368
+ | 1.6487 | 2450 | 0.8724 | - |
369
+ | 1.6824 | 2500 | 0.9397 | 0.9883 (-0.0117) |
370
+ | 1.7160 | 2550 | 0.8903 | - |
371
+ | 1.7497 | 2600 | 0.9305 | 0.9882 (-0.0118) |
372
+ | 1.7833 | 2650 | 0.8741 | - |
373
+ | 1.8170 | 2700 | 0.8951 | 0.9874 (-0.0126) |
374
+ | 1.8506 | 2750 | 0.8958 | - |
375
+ | 1.8843 | 2800 | 0.8529 | 0.9873 (-0.0127) |
376
+ | 1.9179 | 2850 | 0.9468 | - |
377
+ | 1.9515 | 2900 | 0.8683 | 0.9882 (-0.0118) |
378
+ | 1.9852 | 2950 | 0.9145 | - |
379
+ | 2.0188 | 3000 | 0.9137 | 0.9883 (-0.0117) |
380
+ | 2.0525 | 3050 | 0.8175 | - |
381
+ | 2.0861 | 3100 | 0.911 | 0.9883 (-0.0117) |
382
+ | 2.1198 | 3150 | 0.8749 | - |
383
+ | 2.1534 | 3200 | 0.8491 | 0.9883 (-0.0117) |
384
+ | 2.1871 | 3250 | 0.9057 | - |
385
+ | 2.2207 | 3300 | 0.9034 | 0.9882 (-0.0118) |
386
+ | 2.2544 | 3350 | 0.8505 | - |
387
+ | 2.2880 | 3400 | 0.8762 | 0.9883 (-0.0117) |
388
+ | 2.3217 | 3450 | 0.8974 | - |
389
+ | 2.3553 | 3500 | 0.8832 | 0.9884 (-0.0116) |
390
+ | 2.3890 | 3550 | 0.851 | - |
391
+ | 2.4226 | 3600 | 0.8584 | 0.9890 (-0.0110) |
392
+ | 2.4563 | 3650 | 0.9032 | - |
393
+ | 2.4899 | 3700 | 0.8963 | 0.9893 (-0.0107) |
394
+ | 2.5236 | 3750 | 0.8756 | - |
395
+ | 2.5572 | 3800 | 0.843 | 0.9882 (-0.0118) |
396
+ | 2.5908 | 3850 | 0.8778 | - |
397
+ | 2.6245 | 3900 | 0.8434 | 0.9882 (-0.0118) |
398
+ | 2.6581 | 3950 | 0.9193 | - |
399
+ | 2.6918 | 4000 | 0.8724 | 0.9875 (-0.0125) |
400
+ | 2.7254 | 4050 | 0.9062 | - |
401
+ | 2.7591 | 4100 | 0.8807 | 0.9875 (-0.0125) |
402
+ | 2.7927 | 4150 | 0.8252 | - |
403
+ | 2.8264 | 4200 | 0.8725 | 0.9875 (-0.0125) |
404
+ | 2.8600 | 4250 | 0.9094 | - |
405
+ | 2.8937 | 4300 | 0.8589 | 0.9874 (-0.0126) |
406
+ | 2.9273 | 4350 | 0.8625 | - |
407
+ | 2.9610 | 4400 | 0.8138 | 0.9874 (-0.0126) |
408
+ | 2.9946 | 4450 | 0.9217 | - |
409
+ | 3.0283 | 4500 | 0.8871 | 0.9872 (-0.0128) |
410
+ | 3.0619 | 4550 | 0.8504 | - |
411
+ | 3.0956 | 4600 | 0.944 | 0.9873 (-0.0127) |
412
+ | 3.1292 | 4650 | 0.8258 | - |
413
+ | 3.1629 | 4700 | 0.9054 | 0.9874 (-0.0126) |
414
+ | 3.1965 | 4750 | 0.8297 | - |
415
+ | 3.2301 | 4800 | 0.8483 | 0.9875 (-0.0125) |
416
+ | 3.2638 | 4850 | 0.909 | - |
417
+ | 3.2974 | 4900 | 0.8486 | 0.9892 (-0.0108) |
418
+ | 3.3311 | 4950 | 0.8937 | - |
419
+ | 3.3647 | 5000 | 0.8821 | 0.9874 (-0.0126) |
420
+ | 3.3984 | 5050 | 0.873 | - |
421
+ | 3.4320 | 5100 | 0.8773 | 0.9874 (-0.0126) |
422
+ | 3.4657 | 5150 | 0.8592 | - |
423
+ | 3.4993 | 5200 | 0.8449 | 0.9882 (-0.0118) |
424
+ | 3.5330 | 5250 | 0.8651 | - |
425
+ | 3.5666 | 5300 | 0.8943 | 0.9882 (-0.0118) |
426
+ | 3.6003 | 5350 | 0.8535 | - |
427
+ | 3.6339 | 5400 | 0.8687 | 0.9882 (-0.0118) |
428
+ | 3.6676 | 5450 | 0.9213 | - |
429
+ | 3.7012 | 5500 | 0.887 | 0.9882 (-0.0118) |
430
+ | 3.7349 | 5550 | 0.8787 | - |
431
+ | 3.7685 | 5600 | 0.8466 | 0.9882 (-0.0118) |
432
+ | 3.8022 | 5650 | 0.8517 | - |
433
+ | 3.8358 | 5700 | 0.8349 | 0.9883 (-0.0117) |
434
+ | 3.8694 | 5750 | 0.8647 | - |
435
+ | 3.9031 | 5800 | 0.8406 | 0.9882 (-0.0118) |
436
+ | 3.9367 | 5850 | 0.8385 | - |
437
+ | 3.9704 | 5900 | 0.8631 | 0.9882 (-0.0118) |
438
+ | 4.0040 | 5950 | 0.823 | - |
439
+ | 4.0377 | 6000 | 0.9163 | 0.9881 (-0.0119) |
440
+ | 4.0713 | 6050 | 0.8373 | - |
441
+ | 4.1050 | 6100 | 0.892 | 0.9882 (-0.0118) |
442
+ | 4.1386 | 6150 | 0.8666 | - |
443
+ | 4.1723 | 6200 | 0.8536 | 0.9882 (-0.0118) |
444
+ | 4.2059 | 6250 | 0.8784 | - |
445
+ | 4.2396 | 6300 | 0.9616 | 0.9882 (-0.0118) |
446
+ | 4.2732 | 6350 | 0.8464 | - |
447
+ | 4.3069 | 6400 | 0.865 | 0.9882 (-0.0118) |
448
+ | 4.3405 | 6450 | 0.8411 | - |
449
+ | 4.3742 | 6500 | 0.8943 | 0.9882 (-0.0118) |
450
+ | 4.4078 | 6550 | 0.8577 | - |
451
+ | 4.4415 | 6600 | 0.8683 | 0.9882 (-0.0118) |
452
+ | 4.4751 | 6650 | 0.8706 | - |
453
+ | 4.5087 | 6700 | 0.8645 | 0.9882 (-0.0118) |
454
+ | 4.5424 | 6750 | 0.8899 | - |
455
+ | 4.5760 | 6800 | 0.8593 | 0.9882 (-0.0118) |
456
+ | 4.6097 | 6850 | 0.8838 | - |
457
+ | 4.6433 | 6900 | 0.8379 | 0.9882 (-0.0118) |
458
+ | 4.6770 | 6950 | 0.8759 | - |
459
+ | 4.7106 | 7000 | 0.8608 | 0.9882 (-0.0118) |
460
+ | 4.7443 | 7050 | 0.8858 | - |
461
+ | 4.7779 | 7100 | 0.8594 | 0.9882 (-0.0118) |
462
+ | 4.8116 | 7150 | 0.8403 | - |
463
+ | 4.8452 | 7200 | 0.8898 | 0.9882 (-0.0118) |
464
+ | 4.8789 | 7250 | 0.8382 | - |
465
+ | 4.9125 | 7300 | 0.8307 | 0.9882 (-0.0118) |
466
+ | 4.9462 | 7350 | 0.8601 | - |
467
+ | 4.9798 | 7400 | 0.8076 | 0.9882 (-0.0118) |
468
+
469
+ * The bold row denotes the saved checkpoint.
470
+ </details>
471
+
472
+ ### Framework Versions
473
+ - Python: 3.13.2
474
+ - Sentence Transformers: 4.1.0
475
+ - Transformers: 4.51.3
476
+ - PyTorch: 2.7.0+cu128
477
+ - Accelerate: 1.6.0
478
+ - Datasets: 3.6.0
479
+ - Tokenizers: 0.21.1
480
+
481
+ ## Citation
482
+
483
+ ### BibTeX
484
+
485
+ #### Sentence Transformers
486
+ ```bibtex
487
+ @inproceedings{reimers-2019-sentence-bert,
488
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
489
+ author = "Reimers, Nils and Gurevych, Iryna",
490
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
491
+ month = "11",
492
+ year = "2019",
493
+ publisher = "Association for Computational Linguistics",
494
+ url = "https://arxiv.org/abs/1908.10084",
495
+ }
496
+ ```
497
+
498
+ <!--
499
+ ## Glossary
500
+
501
+ *Clearly define terms in order to be accessible across audiences.*
502
+ -->
503
+
504
+ <!--
505
+ ## Model Card Authors
506
+
507
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
508
+ -->
509
+
510
+ <!--
511
+ ## Model Card Contact
512
+
513
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
514
+ -->
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
27
+ "sentence_transformers": {
28
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
29
+ "version": "4.1.0"
30
+ },
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.51.3",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cef312f794c75951055c03231a2b1e07598f7e5e332f8bacfa93f3500a610ced
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff