tomaarsen HF staff commited on
Commit
dca2e44
·
verified ·
1 Parent(s): 19bc59d

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - generated_from_trainer
8
+ - dataset_size:78704
9
+ - loss:ListMLELoss
10
+ base_model: microsoft/MiniLM-L12-H384-uncased
11
+ datasets:
12
+ - microsoft/ms_marco
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ co2_eq_emissions:
20
+ emissions: 95.02104960997458
21
+ energy_consumed: 0.24445732105822607
22
+ source: codecarbon
23
+ training_type: fine-tuning
24
+ on_cloud: false
25
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
26
+ ram_total_size: 31.777088165283203
27
+ hours_used: 0.917
28
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
29
+ model-index:
30
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
31
+ results:
32
+ - task:
33
+ type: cross-encoder-reranking
34
+ name: Cross Encoder Reranking
35
+ dataset:
36
+ name: NanoMSMARCO R100
37
+ type: NanoMSMARCO_R100
38
+ metrics:
39
+ - type: map
40
+ value: 0.4975
41
+ name: Map
42
+ - type: mrr@10
43
+ value: 0.4843
44
+ name: Mrr@10
45
+ - type: ndcg@10
46
+ value: 0.5506
47
+ name: Ndcg@10
48
+ - task:
49
+ type: cross-encoder-reranking
50
+ name: Cross Encoder Reranking
51
+ dataset:
52
+ name: NanoNFCorpus R100
53
+ type: NanoNFCorpus_R100
54
+ metrics:
55
+ - type: map
56
+ value: 0.3252
57
+ name: Map
58
+ - type: mrr@10
59
+ value: 0.5679
60
+ name: Mrr@10
61
+ - type: ndcg@10
62
+ value: 0.3756
63
+ name: Ndcg@10
64
+ - task:
65
+ type: cross-encoder-reranking
66
+ name: Cross Encoder Reranking
67
+ dataset:
68
+ name: NanoNQ R100
69
+ type: NanoNQ_R100
70
+ metrics:
71
+ - type: map
72
+ value: 0.5857
73
+ name: Map
74
+ - type: mrr@10
75
+ value: 0.5922
76
+ name: Mrr@10
77
+ - type: ndcg@10
78
+ value: 0.657
79
+ name: Ndcg@10
80
+ - task:
81
+ type: cross-encoder-nano-beir
82
+ name: Cross Encoder Nano BEIR
83
+ dataset:
84
+ name: NanoBEIR R100 mean
85
+ type: NanoBEIR_R100_mean
86
+ metrics:
87
+ - type: map
88
+ value: 0.4695
89
+ name: Map
90
+ - type: mrr@10
91
+ value: 0.5481
92
+ name: Mrr@10
93
+ - type: ndcg@10
94
+ value: 0.5277
95
+ name: Ndcg@10
96
+ ---
97
+
98
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
99
+
100
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
101
+
102
+ ## Model Details
103
+
104
+ ### Model Description
105
+ - **Model Type:** Cross Encoder
106
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
107
+ - **Maximum Sequence Length:** 512 tokens
108
+ - **Number of Output Labels:** 1 label
109
+ - **Training Dataset:**
110
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
111
+ - **Language:** en
112
+ <!-- - **License:** Unknown -->
113
+
114
+ ### Model Sources
115
+
116
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
117
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
118
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
119
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
120
+
121
+ ## Usage
122
+
123
+ ### Direct Usage (Sentence Transformers)
124
+
125
+ First install the Sentence Transformers library:
126
+
127
+ ```bash
128
+ pip install -U sentence-transformers
129
+ ```
130
+
131
+ Then you can load this model and run inference.
132
+ ```python
133
+ from sentence_transformers import CrossEncoder
134
+
135
+ # Download from the 🤗 Hub
136
+ model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle")
137
+ # Get scores for pairs of texts
138
+ pairs = [
139
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
140
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
141
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
142
+ ]
143
+ scores = model.predict(pairs)
144
+ print(scores.shape)
145
+ # (3,)
146
+
147
+ # Or rank different texts based on similarity to a single text
148
+ ranks = model.rank(
149
+ 'How many calories in an egg',
150
+ [
151
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
152
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
153
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
154
+ ]
155
+ )
156
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
157
+ ```
158
+
159
+ <!--
160
+ ### Direct Usage (Transformers)
161
+
162
+ <details><summary>Click to see the direct usage in Transformers</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Downstream Usage (Sentence Transformers)
169
+
170
+ You can finetune this model on your own dataset.
171
+
172
+ <details><summary>Click to expand</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Out-of-Scope Use
179
+
180
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
+ -->
182
+
183
+ ## Evaluation
184
+
185
+ ### Metrics
186
+
187
+ #### Cross Encoder Reranking
188
+
189
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
190
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
191
+ ```json
192
+ {
193
+ "at_k": 10,
194
+ "always_rerank_positives": true
195
+ }
196
+ ```
197
+
198
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
199
+ |:------------|:---------------------|:---------------------|:---------------------|
200
+ | map | 0.4975 (+0.0079) | 0.3252 (+0.0642) | 0.5857 (+0.1661) |
201
+ | mrr@10 | 0.4843 (+0.0068) | 0.5679 (+0.0681) | 0.5922 (+0.1655) |
202
+ | **ndcg@10** | **0.5506 (+0.0102)** | **0.3756 (+0.0505)** | **0.6570 (+0.1563)** |
203
+
204
+ #### Cross Encoder Nano BEIR
205
+
206
+ * Dataset: `NanoBEIR_R100_mean`
207
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
208
+ ```json
209
+ {
210
+ "dataset_names": [
211
+ "msmarco",
212
+ "nfcorpus",
213
+ "nq"
214
+ ],
215
+ "rerank_k": 100,
216
+ "at_k": 10,
217
+ "always_rerank_positives": true
218
+ }
219
+ ```
220
+
221
+ | Metric | Value |
222
+ |:------------|:---------------------|
223
+ | map | 0.4695 (+0.0794) |
224
+ | mrr@10 | 0.5481 (+0.0801) |
225
+ | **ndcg@10** | **0.5277 (+0.0724)** |
226
+
227
+ <!--
228
+ ## Bias, Risks and Limitations
229
+
230
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
231
+ -->
232
+
233
+ <!--
234
+ ### Recommendations
235
+
236
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
237
+ -->
238
+
239
+ ## Training Details
240
+
241
+ ### Training Dataset
242
+
243
+ #### ms_marco
244
+
245
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
246
+ * Size: 78,704 training samples
247
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
248
+ * Approximate statistics based on the first 1000 samples:
249
+ | | query | docs | labels |
250
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
251
+ | type | string | list | list |
252
+ | details | <ul><li>min: 9 characters</li><li>mean: 33.73 characters</li><li>max: 119 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
253
+ * Samples:
254
+ | query | docs | labels |
255
+ |:-----------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
256
+ | <code>what to avoid during early pregnancy</code> | <code>['Although caffeine does not come under the category of foods to avoid during early pregnancy, pregnant women are advised to limit their caffeine consumption. Caffeine can be found in tea, coffee, soft drinks, chocolate etc.', 'Learn what foods to eat and what to avoid during pregnancy to ensure a healthy environment for your unborn baby! As a concerned parent, you want to do everything possible to ensure the well being and safety of your baby.', 'To stay safe, also avoid these foods during your pregnancy. Meats. 1 Cold cuts, deli meats, hot dogs, and other ready-to-eat meats. ( 2 You can safely eat these if they are heated to steaming and served hot.). 3 Pre-stuffed, fresh, turkey or chicken. 4 Steak tartare or any raw meat. 5 Rare cuts of meat and undercooked meats.', 'Raw and undercooked meat is another among foods to avoid during first trimester. Make sure that the meat is well cooked and consume while it is still hot. It would be good to avoid processed meat since pregnant wom...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
257
+ | <code>where is bells creek</code> | <code>['Simpsonville, SC Real Estate. #facebook# Bells Creek is a small neighborhood in Simpsonville, SC, close to Bells Crossing Elementary. Located near Woodruff Rd., I-85 and I-385, the upscale Bells Creek homes are typically on large lots with mature trees. Bells Creek amenities include a pool and cabana. Bells Creek real estate prices average $220,000.', '#facebook# Bells Creek is a small neighborhood in Simpsonville, SC, close to Bells Crossing Elementary. Located near Woodruff Rd., I-85 and I-385, the upscale Bells Creek homes are typically on large lots with mature trees. Bells Creek amenities include a pool and cabana. Bells Creek real estate prices average $220,000.', "Welcome to The Overlook at Bells Creek, an exclusive Eastwood Homes' Greenville area community only minutes away from Five Forks in Simpsonville, SC.", 'Property Details. Property details for 213 Bells Creek Dr, Simpsonville, SC 29681. This Single Family Home is located at Bells Creek in Simpsonville, South Carolina. The home provides approximately 2074 square feet of living space. This property features 4 bedrooms. There are 3 bathrooms. 213 Bells Creek Dr, Simpsonville, SC 29681 falls within the Greenville county lines. This home sold for $180,000 on Dec 17, 2014. Similar homes in the area are priced around $187,091.']</code> | <code>[1, 1, 0, 0]</code> |
258
+ | <code>how long does it take to hatch geese eggs in an incubator</code> | <code>['Geese take 31 days of incubation for a goose egg to hatch. Whether underneath its parents, or in an incubator, the incubation time is the same.', 'While chicken eggs take 21 days, for example, geese can take between 30 and 35 days and need a higher humidity level. Goslings are also more likely to hatch if the eggs are sprayed with water every day between days 6 and about 25, whereas chicken eggs need to be kept in humid conditions, but dry.', 'Incubation Duration. Incubating goose eggs should be done for a period of about 28 days for smaller breeds, and up to 35 days for larger breeds before pipping begins. Once goose eggs begin hatching, the process can take up to three days before they are completely out of their shell.', "It takes 21 days to incubate the egg. the 21st day is the hatching day. if the eggs are mail order don't count the day the eggs arrive if the temperature is below 55 degrees f … . on the 21st day the eggs will hatch at all different times.", '5. Wait until the eg...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
259
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
260
+ ```json
261
+ {
262
+ "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
263
+ "activation_fct": "torch.nn.modules.linear.Identity",
264
+ "mini_batch_size": 16,
265
+ "respect_input_order": true
266
+ }
267
+ ```
268
+
269
+ ### Evaluation Dataset
270
+
271
+ #### ms_marco
272
+
273
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
274
+ * Size: 1,000 evaluation samples
275
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
276
+ * Approximate statistics based on the first 1000 samples:
277
+ | | query | docs | labels |
278
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
279
+ | type | string | list | list |
280
+ | details | <ul><li>min: 12 characters</li><li>mean: 33.41 characters</li><li>max: 94 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
281
+ * Samples:
282
+ | query | docs | labels |
283
+ |:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
284
+ | <code>who does glenn beck want for speaker of the house</code> | <code>["|. Nebraska Senator Ben Sasse spoke to Glenn Beck Wednesday about Sasse's idea to make Arthur Brooks, head of the American Enterprise Institute, the next Speaker of the House. There’s nothing in the Constitution that requires you to have a Speaker who is an elected member of Congress, Sasse said.", 'And the guy that kept coming to my mind as I was watching Sunday Night Football is Arthur Brooks, the head of AEI. And so I think that the House Republicans should think about going outside the box. There’s nothing in the Constitution that requires you to have a Speaker who is an elected member of Congress.', 'Share on Facebook Share on Twitter. Conservative radio host Glenn Beck went off on Republican leadership after the House passed a budget compromise Thursday, calling House Speaker John Boehner “worthless” and Senate Minority Leader Mitch McConnell a “liar.”.', '“I think John Boehner is one of the prime examples of worthless, worthless Republicans,” Beck said Thursday on Mark Levin’s...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
285
+ | <code>how long do you have to keep former employee records</code> | <code>['Employee Contracts. If you have employment contracts with your employees, then you should maintain these contracts for at least 10 years. According to Financial Web, you should “err on the side of caution” and maintain employee records for a longer period than you think you may need, in case a legal issue arises.', 'Small business expert Rieva Lesonsky suggests you keep employment records for a minimum of two years and for up to seven years. She says that most states have a two-year statute of limitations on lawsuit filings by former employees, so you want to make sure you have documents at hand if this occurs.', 'Since employees may come and go, you may wonder how long you should hang on to the employee records. The Internal Revenue Service (IRS) weighs in on records pertaining to employee taxes, such as payroll, but the other records depend on what types of records you have for employees.', 'Effective January 1, 2013, California law provides that current and former employees (or a ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
286
+ | <code>what year was velcro invented</code> | <code>['Velcro was invented by George de Mestral a Swiss electrical engineer in 1941. This idea of inventing Velcro came to him when one day he returned after a walk from the hills and found cockleburs stuck to his clothes and his dog’s fur. George noticed its natural hook and loop quality and started making a fabric fastener on the same quality.', 'Velcro, which was invented by a Swiss Electrical Engineer George de Mestral, comprises two layers and when both these sides are hard-pressed together, they assist in fixing two surfaces. The thought to invent Velcro hits Mestral’s mind in the year 1941 after coming back from a hunting tour with his dog. ', 'In 1958, de Mestral filed for a patent application for his hook-and-loop fastener in Switzerland, which was granted in 1961. The term Velcro is a registered trademark of Velcro Industries B.V. Velcro Industries is a privately held worldwide corporation manufacturing consumer and industrial products. Among them is a series of mechanical-based f...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
287
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
288
+ ```json
289
+ {
290
+ "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
291
+ "activation_fct": "torch.nn.modules.linear.Identity",
292
+ "mini_batch_size": 16,
293
+ "respect_input_order": true
294
+ }
295
+ ```
296
+
297
+ ### Training Hyperparameters
298
+ #### Non-Default Hyperparameters
299
+
300
+ - `eval_strategy`: steps
301
+ - `per_device_train_batch_size`: 16
302
+ - `per_device_eval_batch_size`: 16
303
+ - `learning_rate`: 2e-05
304
+ - `num_train_epochs`: 1
305
+ - `warmup_ratio`: 0.1
306
+ - `seed`: 12
307
+ - `bf16`: True
308
+ - `load_best_model_at_end`: True
309
+
310
+ #### All Hyperparameters
311
+ <details><summary>Click to expand</summary>
312
+
313
+ - `overwrite_output_dir`: False
314
+ - `do_predict`: False
315
+ - `eval_strategy`: steps
316
+ - `prediction_loss_only`: True
317
+ - `per_device_train_batch_size`: 16
318
+ - `per_device_eval_batch_size`: 16
319
+ - `per_gpu_train_batch_size`: None
320
+ - `per_gpu_eval_batch_size`: None
321
+ - `gradient_accumulation_steps`: 1
322
+ - `eval_accumulation_steps`: None
323
+ - `torch_empty_cache_steps`: None
324
+ - `learning_rate`: 2e-05
325
+ - `weight_decay`: 0.0
326
+ - `adam_beta1`: 0.9
327
+ - `adam_beta2`: 0.999
328
+ - `adam_epsilon`: 1e-08
329
+ - `max_grad_norm`: 1.0
330
+ - `num_train_epochs`: 1
331
+ - `max_steps`: -1
332
+ - `lr_scheduler_type`: linear
333
+ - `lr_scheduler_kwargs`: {}
334
+ - `warmup_ratio`: 0.1
335
+ - `warmup_steps`: 0
336
+ - `log_level`: passive
337
+ - `log_level_replica`: warning
338
+ - `log_on_each_node`: True
339
+ - `logging_nan_inf_filter`: True
340
+ - `save_safetensors`: True
341
+ - `save_on_each_node`: False
342
+ - `save_only_model`: False
343
+ - `restore_callback_states_from_checkpoint`: False
344
+ - `no_cuda`: False
345
+ - `use_cpu`: False
346
+ - `use_mps_device`: False
347
+ - `seed`: 12
348
+ - `data_seed`: None
349
+ - `jit_mode_eval`: False
350
+ - `use_ipex`: False
351
+ - `bf16`: True
352
+ - `fp16`: False
353
+ - `fp16_opt_level`: O1
354
+ - `half_precision_backend`: auto
355
+ - `bf16_full_eval`: False
356
+ - `fp16_full_eval`: False
357
+ - `tf32`: None
358
+ - `local_rank`: 0
359
+ - `ddp_backend`: None
360
+ - `tpu_num_cores`: None
361
+ - `tpu_metrics_debug`: False
362
+ - `debug`: []
363
+ - `dataloader_drop_last`: False
364
+ - `dataloader_num_workers`: 0
365
+ - `dataloader_prefetch_factor`: None
366
+ - `past_index`: -1
367
+ - `disable_tqdm`: False
368
+ - `remove_unused_columns`: True
369
+ - `label_names`: None
370
+ - `load_best_model_at_end`: True
371
+ - `ignore_data_skip`: False
372
+ - `fsdp`: []
373
+ - `fsdp_min_num_params`: 0
374
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
375
+ - `fsdp_transformer_layer_cls_to_wrap`: None
376
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
377
+ - `deepspeed`: None
378
+ - `label_smoothing_factor`: 0.0
379
+ - `optim`: adamw_torch
380
+ - `optim_args`: None
381
+ - `adafactor`: False
382
+ - `group_by_length`: False
383
+ - `length_column_name`: length
384
+ - `ddp_find_unused_parameters`: None
385
+ - `ddp_bucket_cap_mb`: None
386
+ - `ddp_broadcast_buffers`: False
387
+ - `dataloader_pin_memory`: True
388
+ - `dataloader_persistent_workers`: False
389
+ - `skip_memory_metrics`: True
390
+ - `use_legacy_prediction_loop`: False
391
+ - `push_to_hub`: False
392
+ - `resume_from_checkpoint`: None
393
+ - `hub_model_id`: None
394
+ - `hub_strategy`: every_save
395
+ - `hub_private_repo`: None
396
+ - `hub_always_push`: False
397
+ - `gradient_checkpointing`: False
398
+ - `gradient_checkpointing_kwargs`: None
399
+ - `include_inputs_for_metrics`: False
400
+ - `include_for_metrics`: []
401
+ - `eval_do_concat_batches`: True
402
+ - `fp16_backend`: auto
403
+ - `push_to_hub_model_id`: None
404
+ - `push_to_hub_organization`: None
405
+ - `mp_parameters`:
406
+ - `auto_find_batch_size`: False
407
+ - `full_determinism`: False
408
+ - `torchdynamo`: None
409
+ - `ray_scope`: last
410
+ - `ddp_timeout`: 1800
411
+ - `torch_compile`: False
412
+ - `torch_compile_backend`: None
413
+ - `torch_compile_mode`: None
414
+ - `dispatch_batches`: None
415
+ - `split_batches`: None
416
+ - `include_tokens_per_second`: False
417
+ - `include_num_input_tokens_seen`: False
418
+ - `neftune_noise_alpha`: None
419
+ - `optim_target_modules`: None
420
+ - `batch_eval_metrics`: False
421
+ - `eval_on_start`: False
422
+ - `use_liger_kernel`: False
423
+ - `eval_use_gather_object`: False
424
+ - `average_tokens_across_devices`: False
425
+ - `prompts`: None
426
+ - `batch_sampler`: batch_sampler
427
+ - `multi_dataset_batch_sampler`: proportional
428
+
429
+ </details>
430
+
431
+ ### Training Logs
432
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
433
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
434
+ | -1 | -1 | - | - | 0.0344 (-0.5060) | 0.2073 (-0.1178) | 0.0336 (-0.4671) | 0.0918 (-0.3636) |
435
+ | 0.0002 | 1 | 1412.3083 | - | - | - | - | - |
436
+ | 0.0508 | 250 | 887.7485 | - | - | - | - | - |
437
+ | 0.1016 | 500 | 853.8898 | 903.5635 | 0.2242 (-0.3163) | 0.2467 (-0.0783) | 0.3585 (-0.1421) | 0.2765 (-0.1789) |
438
+ | 0.1525 | 750 | 867.3723 | - | - | - | - | - |
439
+ | 0.2033 | 1000 | 851.3223 | 880.1996 | 0.4790 (-0.0614) | 0.3435 (+0.0184) | 0.5945 (+0.0938) | 0.4723 (+0.0170) |
440
+ | 0.2541 | 1250 | 840.5654 | - | - | - | - | - |
441
+ | 0.3049 | 1500 | 836.1076 | 872.8075 | 0.5189 (-0.0216) | 0.3394 (+0.0143) | 0.6097 (+0.1091) | 0.4893 (+0.0339) |
442
+ | 0.3558 | 1750 | 853.3524 | - | - | - | - | - |
443
+ | 0.4066 | 2000 | 859.1896 | 872.7851 | 0.5453 (+0.0049) | 0.3638 (+0.0387) | 0.6322 (+0.1315) | 0.5137 (+0.0584) |
444
+ | 0.4574 | 2250 | 816.2849 | - | - | - | - | - |
445
+ | 0.5082 | 2500 | 832.0728 | 866.5376 | 0.5428 (+0.0023) | 0.3737 (+0.0487) | 0.6384 (+0.1378) | 0.5183 (+0.0629) |
446
+ | 0.5591 | 2750 | 825.9285 | - | - | - | - | - |
447
+ | 0.6099 | 3000 | 809.4326 | 865.0468 | 0.5319 (-0.0085) | 0.3488 (+0.0238) | 0.6320 (+0.1313) | 0.5042 (+0.0489) |
448
+ | 0.6607 | 3250 | 807.3669 | - | - | - | - | - |
449
+ | 0.7115 | 3500 | 828.0153 | 869.0601 | 0.5479 (+0.0075) | 0.3690 (+0.0440) | 0.6495 (+0.1488) | 0.5221 (+0.0668) |
450
+ | 0.7624 | 3750 | 841.2574 | - | - | - | - | - |
451
+ | 0.8132 | 4000 | 814.0583 | 865.1564 | 0.5406 (+0.0001) | 0.3571 (+0.0320) | 0.6519 (+0.1513) | 0.5165 (+0.0612) |
452
+ | 0.8640 | 4250 | 814.6952 | - | - | - | - | - |
453
+ | **0.9148** | **4500** | **825.9762** | **864.4775** | **0.5506 (+0.0102)** | **0.3756 (+0.0505)** | **0.6570 (+0.1563)** | **0.5277 (+0.0724)** |
454
+ | 0.9656 | 4750 | 821.2723 | - | - | - | - | - |
455
+ | -1 | -1 | - | - | 0.5506 (+0.0102) | 0.3756 (+0.0505) | 0.6570 (+0.1563) | 0.5277 (+0.0724) |
456
+
457
+ * The bold row denotes the saved checkpoint.
458
+
459
+ ### Environmental Impact
460
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
461
+ - **Energy Consumed**: 0.244 kWh
462
+ - **Carbon Emitted**: 0.095 kg of CO2
463
+ - **Hours Used**: 0.917 hours
464
+
465
+ ### Training Hardware
466
+ - **On Cloud**: No
467
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
468
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
469
+ - **RAM Size**: 31.78 GB
470
+
471
+ ### Framework Versions
472
+ - Python: 3.11.6
473
+ - Sentence Transformers: 3.5.0.dev0
474
+ - Transformers: 4.49.0
475
+ - PyTorch: 2.6.0+cu124
476
+ - Accelerate: 1.5.1
477
+ - Datasets: 3.3.2
478
+ - Tokenizers: 0.21.0
479
+
480
+ ## Citation
481
+
482
+ ### BibTeX
483
+
484
+ #### Sentence Transformers
485
+ ```bibtex
486
+ @inproceedings{reimers-2019-sentence-bert,
487
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
488
+ author = "Reimers, Nils and Gurevych, Iryna",
489
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
490
+ month = "11",
491
+ year = "2019",
492
+ publisher = "Association for Computational Linguistics",
493
+ url = "https://arxiv.org/abs/1908.10084",
494
+ }
495
+ ```
496
+
497
+ #### ListMLELoss
498
+ ```bibtex
499
+ @inproceedings{lan2013position,
500
+ title={Position-aware ListMLE: a sequential learning process for ranking},
501
+ author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
502
+ booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
503
+ pages={333--342},
504
+ year={2013}
505
+ }
506
+ ```
507
+
508
+ <!--
509
+ ## Glossary
510
+
511
+ *Clearly define terms in order to be accessible across audiences.*
512
+ -->
513
+
514
+ <!--
515
+ ## Model Card Authors
516
+
517
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Contact
522
+
523
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
524
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.49.0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76666299495f576c68c6ce38fa2ad2c08520497f797e2d8de9556d79bec07a30
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff