tomaarsen HF Staff commited on
Commit
8bed20e
·
verified ·
1 Parent(s): de8b714

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,850 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:99000
12
+ - loss:SpladeLoss
13
+ - loss:SparseDistillKLDivLoss
14
+ - loss:FlopsLoss
15
+ base_model: Luyu/co-condenser-marco
16
+ widget:
17
+ - text: 'The ejection fraction may decrease if: 1 You have weakness of your heart
18
+ muscle, such as dilated cardiomyopathy, which can be caused by a heart muscle
19
+ problem, familial (genetic) cardiomyopathy, or systemic illnesses. 2 A heart
20
+ attack has damaged your heart. You have problems with your heart''s valves.'
21
+ - text: "One thing we avoided: Lots of alternative slime recipes swap Borax for liquid\
22
+ \ starch, shampoo, body wash, hand soap, contact lens solution, or laundry detergent.\
23
+ \ Those may seem benign â\x80\x94 and they might be â\x80\x94 but many of them\
24
+ \ contain derivatives or relatives of sodium borate too."
25
+ - text: how do i get my mvr in pa
26
+ - text: English is a language whose vocabulary is the composite of a surprising range
27
+ of influences. We have pillaged words from Latin, Greek, Dutch, Arabic, Old Norse,
28
+ Spanish, Italian, Hindi, and more besides to make English what it is today.
29
+ - text: Weed Eater was a string trimmer company founded in 1971 in Houston, Texas
30
+ by George C. Ballas, Sr. , the inventor of the device. The idea for the Weed Eater
31
+ trimmer came to him from the spinning nylon bristles of an automatic car wash.He
32
+ thought that he could come up with a similar technique to protect the bark on
33
+ trees that he was trimming around. His company was eventually bought by Emerson
34
+ Electric and merged with Poulan.Poulan/Weed Eater was later purchased by Electrolux,
35
+ which spun off the outdoors division as Husqvarna AB in 2006.Inventor Ballas was
36
+ the father of champion ballroom dancer Corky Ballas and the grandfather of Dancing
37
+ with the Stars dancer Mark Ballas.George Ballas died on June 25, 2011.he idea
38
+ for the Weed Eater trimmer came to him from the spinning nylon bristles of an
39
+ automatic car wash. He thought that he could come up with a similar technique
40
+ to protect the bark on trees that he was trimming around. His company was eventually
41
+ bought by Emerson Electric and merged with Poulan.
42
+ pipeline_tag: feature-extraction
43
+ library_name: sentence-transformers
44
+ metrics:
45
+ - dot_accuracy@1
46
+ - dot_accuracy@3
47
+ - dot_accuracy@5
48
+ - dot_accuracy@10
49
+ - dot_precision@1
50
+ - dot_precision@3
51
+ - dot_precision@5
52
+ - dot_precision@10
53
+ - dot_recall@1
54
+ - dot_recall@3
55
+ - dot_recall@5
56
+ - dot_recall@10
57
+ - dot_ndcg@10
58
+ - dot_mrr@10
59
+ - dot_map@100
60
+ - query_active_dims
61
+ - query_sparsity_ratio
62
+ - corpus_active_dims
63
+ - corpus_sparsity_ratio
64
+ co2_eq_emissions:
65
+ emissions: 78.12595691469743
66
+ energy_consumed: 0.2009919087493695
67
+ source: codecarbon
68
+ training_type: fine-tuning
69
+ on_cloud: false
70
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
71
+ ram_total_size: 31.777088165283203
72
+ hours_used: 0.571
73
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
74
+ model-index:
75
+ - name: CoCondenser finetuned on MS MARCO
76
+ results:
77
+ - task:
78
+ type: sparse-information-retrieval
79
+ name: Sparse Information Retrieval
80
+ dataset:
81
+ name: NanoMSMARCO
82
+ type: NanoMSMARCO
83
+ metrics:
84
+ - type: dot_accuracy@1
85
+ value: 0.42
86
+ name: Dot Accuracy@1
87
+ - type: dot_accuracy@3
88
+ value: 0.56
89
+ name: Dot Accuracy@3
90
+ - type: dot_accuracy@5
91
+ value: 0.72
92
+ name: Dot Accuracy@5
93
+ - type: dot_accuracy@10
94
+ value: 0.9
95
+ name: Dot Accuracy@10
96
+ - type: dot_precision@1
97
+ value: 0.42
98
+ name: Dot Precision@1
99
+ - type: dot_precision@3
100
+ value: 0.18666666666666668
101
+ name: Dot Precision@3
102
+ - type: dot_precision@5
103
+ value: 0.14400000000000002
104
+ name: Dot Precision@5
105
+ - type: dot_precision@10
106
+ value: 0.08999999999999998
107
+ name: Dot Precision@10
108
+ - type: dot_recall@1
109
+ value: 0.42
110
+ name: Dot Recall@1
111
+ - type: dot_recall@3
112
+ value: 0.56
113
+ name: Dot Recall@3
114
+ - type: dot_recall@5
115
+ value: 0.72
116
+ name: Dot Recall@5
117
+ - type: dot_recall@10
118
+ value: 0.9
119
+ name: Dot Recall@10
120
+ - type: dot_ndcg@10
121
+ value: 0.6291399713464962
122
+ name: Dot Ndcg@10
123
+ - type: dot_mrr@10
124
+ value: 0.5467460317460318
125
+ name: Dot Mrr@10
126
+ - type: dot_map@100
127
+ value: 0.5503396478777393
128
+ name: Dot Map@100
129
+ - type: query_active_dims
130
+ value: 23.31999969482422
131
+ name: Query Active Dims
132
+ - type: query_sparsity_ratio
133
+ value: 0.9992359609562013
134
+ name: Query Sparsity Ratio
135
+ - type: corpus_active_dims
136
+ value: 257.3004150390625
137
+ name: Corpus Active Dims
138
+ - type: corpus_sparsity_ratio
139
+ value: 0.9915700014730665
140
+ name: Corpus Sparsity Ratio
141
+ - task:
142
+ type: sparse-information-retrieval
143
+ name: Sparse Information Retrieval
144
+ dataset:
145
+ name: NanoNFCorpus
146
+ type: NanoNFCorpus
147
+ metrics:
148
+ - type: dot_accuracy@1
149
+ value: 0.44
150
+ name: Dot Accuracy@1
151
+ - type: dot_accuracy@3
152
+ value: 0.56
153
+ name: Dot Accuracy@3
154
+ - type: dot_accuracy@5
155
+ value: 0.6
156
+ name: Dot Accuracy@5
157
+ - type: dot_accuracy@10
158
+ value: 0.66
159
+ name: Dot Accuracy@10
160
+ - type: dot_precision@1
161
+ value: 0.44
162
+ name: Dot Precision@1
163
+ - type: dot_precision@3
164
+ value: 0.38666666666666666
165
+ name: Dot Precision@3
166
+ - type: dot_precision@5
167
+ value: 0.32800000000000007
168
+ name: Dot Precision@5
169
+ - type: dot_precision@10
170
+ value: 0.272
171
+ name: Dot Precision@10
172
+ - type: dot_recall@1
173
+ value: 0.041590314149379026
174
+ name: Dot Recall@1
175
+ - type: dot_recall@3
176
+ value: 0.07672442108786207
177
+ name: Dot Recall@3
178
+ - type: dot_recall@5
179
+ value: 0.09154300468916865
180
+ name: Dot Recall@5
181
+ - type: dot_recall@10
182
+ value: 0.1433130618338512
183
+ name: Dot Recall@10
184
+ - type: dot_ndcg@10
185
+ value: 0.33898990155781883
186
+ name: Dot Ndcg@10
187
+ - type: dot_mrr@10
188
+ value: 0.5123809523809524
189
+ name: Dot Mrr@10
190
+ - type: dot_map@100
191
+ value: 0.1505453583653259
192
+ name: Dot Map@100
193
+ - type: query_active_dims
194
+ value: 21.260000228881836
195
+ name: Query Active Dims
196
+ - type: query_sparsity_ratio
197
+ value: 0.9993034532393394
198
+ name: Query Sparsity Ratio
199
+ - type: corpus_active_dims
200
+ value: 494.8533630371094
201
+ name: Corpus Active Dims
202
+ - type: corpus_sparsity_ratio
203
+ value: 0.9837869941996883
204
+ name: Corpus Sparsity Ratio
205
+ - task:
206
+ type: sparse-information-retrieval
207
+ name: Sparse Information Retrieval
208
+ dataset:
209
+ name: NanoNQ
210
+ type: NanoNQ
211
+ metrics:
212
+ - type: dot_accuracy@1
213
+ value: 0.44
214
+ name: Dot Accuracy@1
215
+ - type: dot_accuracy@3
216
+ value: 0.74
217
+ name: Dot Accuracy@3
218
+ - type: dot_accuracy@5
219
+ value: 0.78
220
+ name: Dot Accuracy@5
221
+ - type: dot_accuracy@10
222
+ value: 0.84
223
+ name: Dot Accuracy@10
224
+ - type: dot_precision@1
225
+ value: 0.44
226
+ name: Dot Precision@1
227
+ - type: dot_precision@3
228
+ value: 0.2533333333333333
229
+ name: Dot Precision@3
230
+ - type: dot_precision@5
231
+ value: 0.16399999999999998
232
+ name: Dot Precision@5
233
+ - type: dot_precision@10
234
+ value: 0.08999999999999998
235
+ name: Dot Precision@10
236
+ - type: dot_recall@1
237
+ value: 0.42
238
+ name: Dot Recall@1
239
+ - type: dot_recall@3
240
+ value: 0.7
241
+ name: Dot Recall@3
242
+ - type: dot_recall@5
243
+ value: 0.74
244
+ name: Dot Recall@5
245
+ - type: dot_recall@10
246
+ value: 0.81
247
+ name: Dot Recall@10
248
+ - type: dot_ndcg@10
249
+ value: 0.6304630848492498
250
+ name: Dot Ndcg@10
251
+ - type: dot_mrr@10
252
+ value: 0.5837460317460317
253
+ name: Dot Mrr@10
254
+ - type: dot_map@100
255
+ value: 0.5712846533262134
256
+ name: Dot Map@100
257
+ - type: query_active_dims
258
+ value: 28.0
259
+ name: Query Active Dims
260
+ - type: query_sparsity_ratio
261
+ value: 0.9990826289233995
262
+ name: Query Sparsity Ratio
263
+ - type: corpus_active_dims
264
+ value: 290.61212158203125
265
+ name: Corpus Active Dims
266
+ - type: corpus_sparsity_ratio
267
+ value: 0.9904786016125408
268
+ name: Corpus Sparsity Ratio
269
+ - task:
270
+ type: sparse-nano-beir
271
+ name: Sparse Nano BEIR
272
+ dataset:
273
+ name: NanoBEIR mean
274
+ type: NanoBEIR_mean
275
+ metrics:
276
+ - type: dot_accuracy@1
277
+ value: 0.43333333333333335
278
+ name: Dot Accuracy@1
279
+ - type: dot_accuracy@3
280
+ value: 0.62
281
+ name: Dot Accuracy@3
282
+ - type: dot_accuracy@5
283
+ value: 0.6999999999999998
284
+ name: Dot Accuracy@5
285
+ - type: dot_accuracy@10
286
+ value: 0.7999999999999999
287
+ name: Dot Accuracy@10
288
+ - type: dot_precision@1
289
+ value: 0.43333333333333335
290
+ name: Dot Precision@1
291
+ - type: dot_precision@3
292
+ value: 0.27555555555555555
293
+ name: Dot Precision@3
294
+ - type: dot_precision@5
295
+ value: 0.21200000000000005
296
+ name: Dot Precision@5
297
+ - type: dot_precision@10
298
+ value: 0.15066666666666664
299
+ name: Dot Precision@10
300
+ - type: dot_recall@1
301
+ value: 0.29386343804979304
302
+ name: Dot Recall@1
303
+ - type: dot_recall@3
304
+ value: 0.44557480702928737
305
+ name: Dot Recall@3
306
+ - type: dot_recall@5
307
+ value: 0.5171810015630562
308
+ name: Dot Recall@5
309
+ - type: dot_recall@10
310
+ value: 0.6177710206112837
311
+ name: Dot Recall@10
312
+ - type: dot_ndcg@10
313
+ value: 0.5328643192511883
314
+ name: Dot Ndcg@10
315
+ - type: dot_mrr@10
316
+ value: 0.5476243386243386
317
+ name: Dot Mrr@10
318
+ - type: dot_map@100
319
+ value: 0.42405655318975954
320
+ name: Dot Map@100
321
+ - type: query_active_dims
322
+ value: 24.19333330790202
323
+ name: Query Active Dims
324
+ - type: query_sparsity_ratio
325
+ value: 0.9992073477063135
326
+ name: Query Sparsity Ratio
327
+ - type: corpus_active_dims
328
+ value: 324.00429792464917
329
+ name: Corpus Active Dims
330
+ - type: corpus_sparsity_ratio
331
+ value: 0.9893845652996315
332
+ name: Corpus Sparsity Ratio
333
+ ---
334
+
335
+ # CoCondenser finetuned on MS MARCO
336
+
337
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
338
+ ## Model Details
339
+
340
+ ### Model Description
341
+ - **Model Type:** SPLADE Sparse Encoder
342
+ - **Base model:** [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) <!-- at revision e0cef0ab2410aae0f0994366ddefb5649a266709 -->
343
+ - **Maximum Sequence Length:** 512 tokens
344
+ - **Output Dimensionality:** 30522 dimensions
345
+ - **Similarity Function:** Dot Product
346
+ <!-- - **Training Dataset:** Unknown -->
347
+ - **Language:** en
348
+ - **License:** apache-2.0
349
+
350
+ ### Model Sources
351
+
352
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
353
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
354
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
355
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
356
+
357
+ ### Full Model Architecture
358
+
359
+ ```
360
+ SparseEncoder(
361
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
362
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
363
+ )
364
+ ```
365
+
366
+ ## Usage
367
+
368
+ ### Direct Usage (Sentence Transformers)
369
+
370
+ First install the Sentence Transformers library:
371
+
372
+ ```bash
373
+ pip install -U sentence-transformers
374
+ ```
375
+
376
+ Then you can load this model and run inference.
377
+ ```python
378
+ from sentence_transformers import SparseEncoder
379
+
380
+ # Download from the 🤗 Hub
381
+ model = SparseEncoder("tomaarsen/splade-cocondenser-kldiv-marginmse-minilm-temp-4")
382
+ # Run inference
383
+ queries = [
384
+ "who started gladiator lacrosse",
385
+ ]
386
+ documents = [
387
+ 'Weed Eater was a string trimmer company founded in 1971 in Houston, Texas by George C. Ballas, Sr. , the inventor of the device. The idea for the Weed Eater trimmer came to him from the spinning nylon bristles of an automatic car wash.He thought that he could come up with a similar technique to protect the bark on trees that he was trimming around. His company was eventually bought by Emerson Electric and merged with Poulan.Poulan/Weed Eater was later purchased by Electrolux, which spun off the outdoors division as Husqvarna AB in 2006.Inventor Ballas was the father of champion ballroom dancer Corky Ballas and the grandfather of Dancing with the Stars dancer Mark Ballas.George Ballas died on June 25, 2011.he idea for the Weed Eater trimmer came to him from the spinning nylon bristles of an automatic car wash. He thought that he could come up with a similar technique to protect the bark on trees that he was trimming around. His company was eventually bought by Emerson Electric and merged with Poulan.',
388
+ "The earliest types of gladiator were named after Rome's enemies of that time: the Samnite, Thracian and Gaul. The Samnite, heavily armed, elegantly helmed and probably the most popular type, was renamed Secutor and the Gaul renamed Murmillo, once these former enemies had been conquered then absorbed into Rome's Empire.",
389
+ 'Summit Hill, PA. Sponsored Topics. Summit Hill is a borough in Carbon County, Pennsylvania, United States. The population was 2,974 at the 2000 census. Summit Hill is located at 40°49â\x80²39â\x80³N 75°51â\x80²57â\x80³W / 40.8275°N 75.86583°W / 40.8275; -75.86583 (40.827420, -75.865892).',
390
+ ]
391
+ query_embeddings = model.encode_query(queries)
392
+ document_embeddings = model.encode_document(documents)
393
+ print(query_embeddings.shape, document_embeddings.shape)
394
+ # [1, 30522] [3, 30522]
395
+
396
+ # Get the similarity scores for the embeddings
397
+ similarities = model.similarity(query_embeddings, document_embeddings)
398
+ print(similarities)
399
+ # tensor([[19.3181, 29.9645, 13.8348]])
400
+ ```
401
+
402
+ <!--
403
+ ### Direct Usage (Transformers)
404
+
405
+ <details><summary>Click to see the direct usage in Transformers</summary>
406
+
407
+ </details>
408
+ -->
409
+
410
+ <!--
411
+ ### Downstream Usage (Sentence Transformers)
412
+
413
+ You can finetune this model on your own dataset.
414
+
415
+ <details><summary>Click to expand</summary>
416
+
417
+ </details>
418
+ -->
419
+
420
+ <!--
421
+ ### Out-of-Scope Use
422
+
423
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
424
+ -->
425
+
426
+ ## Evaluation
427
+
428
+ ### Metrics
429
+
430
+ #### Sparse Information Retrieval
431
+
432
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
433
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
434
+
435
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
436
+ |:----------------------|:------------|:-------------|:-----------|
437
+ | dot_accuracy@1 | 0.42 | 0.44 | 0.44 |
438
+ | dot_accuracy@3 | 0.56 | 0.56 | 0.74 |
439
+ | dot_accuracy@5 | 0.72 | 0.6 | 0.78 |
440
+ | dot_accuracy@10 | 0.9 | 0.66 | 0.84 |
441
+ | dot_precision@1 | 0.42 | 0.44 | 0.44 |
442
+ | dot_precision@3 | 0.1867 | 0.3867 | 0.2533 |
443
+ | dot_precision@5 | 0.144 | 0.328 | 0.164 |
444
+ | dot_precision@10 | 0.09 | 0.272 | 0.09 |
445
+ | dot_recall@1 | 0.42 | 0.0416 | 0.42 |
446
+ | dot_recall@3 | 0.56 | 0.0767 | 0.7 |
447
+ | dot_recall@5 | 0.72 | 0.0915 | 0.74 |
448
+ | dot_recall@10 | 0.9 | 0.1433 | 0.81 |
449
+ | **dot_ndcg@10** | **0.6291** | **0.339** | **0.6305** |
450
+ | dot_mrr@10 | 0.5467 | 0.5124 | 0.5837 |
451
+ | dot_map@100 | 0.5503 | 0.1505 | 0.5713 |
452
+ | query_active_dims | 23.32 | 21.26 | 28.0 |
453
+ | query_sparsity_ratio | 0.9992 | 0.9993 | 0.9991 |
454
+ | corpus_active_dims | 257.3004 | 494.8534 | 290.6121 |
455
+ | corpus_sparsity_ratio | 0.9916 | 0.9838 | 0.9905 |
456
+
457
+ #### Sparse Nano BEIR
458
+
459
+ * Dataset: `NanoBEIR_mean`
460
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
461
+ ```json
462
+ {
463
+ "dataset_names": [
464
+ "msmarco",
465
+ "nfcorpus",
466
+ "nq"
467
+ ]
468
+ }
469
+ ```
470
+
471
+ | Metric | Value |
472
+ |:----------------------|:-----------|
473
+ | dot_accuracy@1 | 0.4333 |
474
+ | dot_accuracy@3 | 0.62 |
475
+ | dot_accuracy@5 | 0.7 |
476
+ | dot_accuracy@10 | 0.8 |
477
+ | dot_precision@1 | 0.4333 |
478
+ | dot_precision@3 | 0.2756 |
479
+ | dot_precision@5 | 0.212 |
480
+ | dot_precision@10 | 0.1507 |
481
+ | dot_recall@1 | 0.2939 |
482
+ | dot_recall@3 | 0.4456 |
483
+ | dot_recall@5 | 0.5172 |
484
+ | dot_recall@10 | 0.6178 |
485
+ | **dot_ndcg@10** | **0.5329** |
486
+ | dot_mrr@10 | 0.5476 |
487
+ | dot_map@100 | 0.4241 |
488
+ | query_active_dims | 24.1933 |
489
+ | query_sparsity_ratio | 0.9992 |
490
+ | corpus_active_dims | 324.0043 |
491
+ | corpus_sparsity_ratio | 0.9894 |
492
+
493
+ <!--
494
+ ## Bias, Risks and Limitations
495
+
496
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
497
+ -->
498
+
499
+ <!--
500
+ ### Recommendations
501
+
502
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
503
+ -->
504
+
505
+ ## Training Details
506
+
507
+ ### Training Dataset
508
+
509
+ #### Unnamed Dataset
510
+
511
+ * Size: 99,000 training samples
512
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
513
+ * Approximate statistics based on the first 1000 samples:
514
+ | | query | positive | negative | label |
515
+ |:--------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
516
+ | type | string | string | string | list |
517
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.2 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.86 tokens</li><li>max: 219 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.96 tokens</li><li>max: 270 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
518
+ * Samples:
519
+ | query | positive | negative | label |
520
+ |:---------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------|
521
+ | <code>rtn tv network</code> | <code>Home Shopping Network. Home Shopping Network (HSN) is an American broadcast, basic cable and satellite television network that is owned by HSN, Inc. (NASDAQ: HSNI), which also owns catalog company Cornerstone Brands. Based in St. Petersburg, Florida, United States, the home shopping channel has former and current sister channels in several other countries.</code> | <code>The Public Switched Telephone Network - The public switched telephone network (PSTN) is the international network of circuit-switched telephones. Learn more about PSTN at HowStuffWorks. x</code> | <code>[-1.0804121494293213, -5.908488750457764]</code> |
522
+ | <code>how did president nixon react to the watergate investigation?</code> | <code>The Watergate scandal was a major political scandal that occurred in the United States during the early 1970s, following a break-in by five men at the Democratic National Committee headquarters at the Watergate office complex in Washington, D.C. on June 17, 1972, and President Richard Nixon's administration's subsequent attempt to cover up its involvement. After the five burglars were caught and the conspiracy was discovered, Watergate was investigated by the United States Congress. Meanwhile, N</code> | <code>The release of the tape was ordered by the Supreme Court on July 24, 1974, in a case known as United States v. Nixon. The court’s decision was unanimous. President Nixon released the tape on August 5. It was one of three conversations he had with Haldeman six days after the Watergate break-in. The tapes prove that he ordered a cover-up of the Watergate burglary. The Smoking Gun tape reveals that Nixon ordered the FBI to abandon its investigation of the break-in. [Read more…]</code> | <code>[4.117279052734375, 3.191757917404175]</code> |
523
+ | <code>what is a summary offense in pennsylvania</code> | <code>We provide cost effective house arrest and electronic monitoring services to magisterial district court systems throughout Pennsylvania including York, Harrisburg, Philadelphia and Allentown.In addition, we also serve the York County, Lancaster County and Chester County.e provide cost effective house arrest and electronic monitoring services to magisterial district court systems throughout Pennsylvania including York, Harrisburg, Philadelphia and Allentown.</code> | <code>In order to be convicted of Simple Assault, one must cause bodily injury. To be convicted of Aggravated Assault, one must cause serious bodily injury. From my research, Pennsylvania law defines bodily injury as the impairment of physical condition or substantial pain.</code> | <code>[-8.954689025878906, -1.3361705541610718]</code> |
524
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
525
+ ```json
526
+ {
527
+ "loss": "SparseDistillKLDivLoss",
528
+ "lambda_corpus": 0.0005,
529
+ "lambda_query": 0.0005
530
+ }
531
+ ```
532
+
533
+ ### Evaluation Dataset
534
+
535
+ #### Unnamed Dataset
536
+
537
+ * Size: 1,000 evaluation samples
538
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
539
+ * Approximate statistics based on the first 1000 samples:
540
+ | | query | positive | negative | label |
541
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
542
+ | type | string | string | string | list |
543
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.12 tokens</li><li>max: 37 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 78.91 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>min: 25 tokens</li><li>mean: 81.25 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
544
+ * Samples:
545
+ | query | positive | negative | label |
546
+ |:-----------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------|
547
+ | <code>how long to cook roast beef for</code> | <code>Roasting times for beef. Preheat your oven to 160°C (325°F) and use these cooking times to prepare a roast that's moist, tender and delicious. Your roast should be covered with foil for the first half of the roasting time to prevent drying the outer layer.3 to 5lb Joint 1½ to 2 hours.reheat your oven to 160°C (325°F) and use these cooking times to prepare a roast that's moist, tender and delicious. Your roast should be covered with foil for the first half of the roasting time to prevent drying the outer layer.</code> | <code>Estimating Cooking Time for Large Beef Roasts. If you roast at a steady 325F (160C), subtract 2 minutes or so per pound. If the roast is refrigerated just before going into the oven, add 2 or 3 minutes per pound. WARNING NOTES: Remember, the rib roast will continue to cook as it sets.</code> | <code>[6.501978874206543, 8.214995384216309]</code> |
548
+ | <code>definition of fire inspection</code> | <code>Learn how to do a monthly fire extinguisher inspection in your workplace. Departments must assign an individual to inspect monthly the extinguishers in or adjacent to the department's facilities.1 Read Fire Extinguisher Types and Maintenance for more information.earn how to do a monthly fire extinguisher inspection in your workplace. Departments must assign an individual to inspect monthly the extinguishers in or adjacent to the department's facilities.</code> | <code>reconnaissance by fire-a method of reconnaissance in which fire is placed on a suspected enemy position in order to cause the enemy to disclose his presence by moving or returning fire. reconnaissance in force-an offensive operation designed to discover or test the enemy's strength (or to obtain other information). mission undertaken to obtain, by visual observation or other detection methods, information about the activities and resources of an enemy or potential enemy, or to secure data concerning the meteorological, hydrographic, or geographic characteristics of a particular area.</code> | <code>[-0.38299351930618286, -0.9372650384902954]</code> |
549
+ | <code>how many stores does family dollar have</code> | <code>Property Spotlight: New Retail Center at Hamilton & Warner - Outlots Available!! Family Dollar is closing stores following a disappointing second quarter. Family Dollar Stores Inc. won’t just be cutting prices in an attempt to boost its business – it’ll be closing stores as well. The Matthews, N.C.-based discount retailer plans to shutter 370 under-performing shops, according to the Charlotte Business Journal.</code> | <code>Glassdoor has 1,976 Family Dollar Stores reviews submitted anonymously by Family Dollar Stores employees. Read employee reviews and ratings on Glassdoor to decide if Family Dollar Stores is right for you.</code> | <code>[4.726407527923584, 8.284608840942383]</code> |
550
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
551
+ ```json
552
+ {
553
+ "loss": "SparseDistillKLDivLoss",
554
+ "lambda_corpus": 0.0005,
555
+ "lambda_query": 0.0005
556
+ }
557
+ ```
558
+
559
+ ### Training Hyperparameters
560
+ #### Non-Default Hyperparameters
561
+
562
+ - `eval_strategy`: steps
563
+ - `per_device_train_batch_size`: 16
564
+ - `per_device_eval_batch_size`: 16
565
+ - `learning_rate`: 2e-05
566
+ - `num_train_epochs`: 1
567
+ - `warmup_ratio`: 0.1
568
+ - `fp16`: True
569
+ - `batch_sampler`: no_duplicates
570
+
571
+ #### All Hyperparameters
572
+ <details><summary>Click to expand</summary>
573
+
574
+ - `overwrite_output_dir`: False
575
+ - `do_predict`: False
576
+ - `eval_strategy`: steps
577
+ - `prediction_loss_only`: True
578
+ - `per_device_train_batch_size`: 16
579
+ - `per_device_eval_batch_size`: 16
580
+ - `per_gpu_train_batch_size`: None
581
+ - `per_gpu_eval_batch_size`: None
582
+ - `gradient_accumulation_steps`: 1
583
+ - `eval_accumulation_steps`: None
584
+ - `torch_empty_cache_steps`: None
585
+ - `learning_rate`: 2e-05
586
+ - `weight_decay`: 0.0
587
+ - `adam_beta1`: 0.9
588
+ - `adam_beta2`: 0.999
589
+ - `adam_epsilon`: 1e-08
590
+ - `max_grad_norm`: 1.0
591
+ - `num_train_epochs`: 1
592
+ - `max_steps`: -1
593
+ - `lr_scheduler_type`: linear
594
+ - `lr_scheduler_kwargs`: {}
595
+ - `warmup_ratio`: 0.1
596
+ - `warmup_steps`: 0
597
+ - `log_level`: passive
598
+ - `log_level_replica`: warning
599
+ - `log_on_each_node`: True
600
+ - `logging_nan_inf_filter`: True
601
+ - `save_safetensors`: True
602
+ - `save_on_each_node`: False
603
+ - `save_only_model`: False
604
+ - `restore_callback_states_from_checkpoint`: False
605
+ - `no_cuda`: False
606
+ - `use_cpu`: False
607
+ - `use_mps_device`: False
608
+ - `seed`: 42
609
+ - `data_seed`: None
610
+ - `jit_mode_eval`: False
611
+ - `use_ipex`: False
612
+ - `bf16`: False
613
+ - `fp16`: True
614
+ - `fp16_opt_level`: O1
615
+ - `half_precision_backend`: auto
616
+ - `bf16_full_eval`: False
617
+ - `fp16_full_eval`: False
618
+ - `tf32`: None
619
+ - `local_rank`: 0
620
+ - `ddp_backend`: None
621
+ - `tpu_num_cores`: None
622
+ - `tpu_metrics_debug`: False
623
+ - `debug`: []
624
+ - `dataloader_drop_last`: False
625
+ - `dataloader_num_workers`: 0
626
+ - `dataloader_prefetch_factor`: None
627
+ - `past_index`: -1
628
+ - `disable_tqdm`: False
629
+ - `remove_unused_columns`: True
630
+ - `label_names`: None
631
+ - `load_best_model_at_end`: False
632
+ - `ignore_data_skip`: False
633
+ - `fsdp`: []
634
+ - `fsdp_min_num_params`: 0
635
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
636
+ - `fsdp_transformer_layer_cls_to_wrap`: None
637
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
638
+ - `deepspeed`: None
639
+ - `label_smoothing_factor`: 0.0
640
+ - `optim`: adamw_torch
641
+ - `optim_args`: None
642
+ - `adafactor`: False
643
+ - `group_by_length`: False
644
+ - `length_column_name`: length
645
+ - `ddp_find_unused_parameters`: None
646
+ - `ddp_bucket_cap_mb`: None
647
+ - `ddp_broadcast_buffers`: False
648
+ - `dataloader_pin_memory`: True
649
+ - `dataloader_persistent_workers`: False
650
+ - `skip_memory_metrics`: True
651
+ - `use_legacy_prediction_loop`: False
652
+ - `push_to_hub`: False
653
+ - `resume_from_checkpoint`: None
654
+ - `hub_model_id`: None
655
+ - `hub_strategy`: every_save
656
+ - `hub_private_repo`: None
657
+ - `hub_always_push`: False
658
+ - `gradient_checkpointing`: False
659
+ - `gradient_checkpointing_kwargs`: None
660
+ - `include_inputs_for_metrics`: False
661
+ - `include_for_metrics`: []
662
+ - `eval_do_concat_batches`: True
663
+ - `fp16_backend`: auto
664
+ - `push_to_hub_model_id`: None
665
+ - `push_to_hub_organization`: None
666
+ - `mp_parameters`:
667
+ - `auto_find_batch_size`: False
668
+ - `full_determinism`: False
669
+ - `torchdynamo`: None
670
+ - `ray_scope`: last
671
+ - `ddp_timeout`: 1800
672
+ - `torch_compile`: False
673
+ - `torch_compile_backend`: None
674
+ - `torch_compile_mode`: None
675
+ - `include_tokens_per_second`: False
676
+ - `include_num_input_tokens_seen`: False
677
+ - `neftune_noise_alpha`: None
678
+ - `optim_target_modules`: None
679
+ - `batch_eval_metrics`: False
680
+ - `eval_on_start`: False
681
+ - `use_liger_kernel`: False
682
+ - `eval_use_gather_object`: False
683
+ - `average_tokens_across_devices`: False
684
+ - `prompts`: None
685
+ - `batch_sampler`: no_duplicates
686
+ - `multi_dataset_batch_sampler`: proportional
687
+ - `router_mapping`: {}
688
+ - `learning_rate_mapping`: {}
689
+
690
+ </details>
691
+
692
+ ### Training Logs
693
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
694
+ |:------:|:----:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|
695
+ | -1 | -1 | - | - | 0.0823 | 0.0412 | 0.0621 | 0.0619 |
696
+ | 0.0162 | 100 | 740.8226 | - | - | - | - | - |
697
+ | 0.0323 | 200 | 82.2666 | - | - | - | - | - |
698
+ | 0.0485 | 300 | 3.3514 | - | - | - | - | - |
699
+ | 0.0646 | 400 | 1.9689 | - | - | - | - | - |
700
+ | 0.0808 | 500 | 1.8268 | 1.8327 | 0.1979 | 0.1096 | 0.2507 | 0.1861 |
701
+ | 0.0970 | 600 | 1.8 | - | - | - | - | - |
702
+ | 0.1131 | 700 | 1.613 | - | - | - | - | - |
703
+ | 0.1293 | 800 | 1.5977 | - | - | - | - | - |
704
+ | 0.1454 | 900 | 1.5886 | - | - | - | - | - |
705
+ | 0.1616 | 1000 | 1.3922 | 1.2983 | 0.5044 | 0.2715 | 0.5851 | 0.4537 |
706
+ | 0.1778 | 1100 | 1.3708 | - | - | - | - | - |
707
+ | 0.1939 | 1200 | 1.383 | - | - | - | - | - |
708
+ | 0.2101 | 1300 | 1.2148 | - | - | - | - | - |
709
+ | 0.2262 | 1400 | 1.246 | - | - | - | - | - |
710
+ | 0.2424 | 1500 | 1.2206 | 1.0998 | 0.5329 | 0.2969 | 0.5945 | 0.4748 |
711
+ | 0.2586 | 1600 | 1.1962 | - | - | - | - | - |
712
+ | 0.2747 | 1700 | 1.1546 | - | - | - | - | - |
713
+ | 0.2909 | 1800 | 1.1319 | - | - | - | - | - |
714
+ | 0.3070 | 1900 | 1.1656 | - | - | - | - | - |
715
+ | 0.3232 | 2000 | 1.1196 | 0.9878 | 0.5667 | 0.3283 | 0.6106 | 0.5019 |
716
+ | 0.3394 | 2100 | 1.0789 | - | - | - | - | - |
717
+ | 0.3555 | 2200 | 1.0148 | - | - | - | - | - |
718
+ | 0.3717 | 2300 | 1.042 | - | - | - | - | - |
719
+ | 0.3878 | 2400 | 1.0274 | - | - | - | - | - |
720
+ | 0.4040 | 2500 | 1.0041 | 0.8749 | 0.6059 | 0.3346 | 0.5942 | 0.5116 |
721
+ | 0.4202 | 2600 | 1.0557 | - | - | - | - | - |
722
+ | 0.4363 | 2700 | 1.0077 | - | - | - | - | - |
723
+ | 0.4525 | 2800 | 1.0115 | - | - | - | - | - |
724
+ | 0.4686 | 2900 | 0.8708 | - | - | - | - | - |
725
+ | 0.4848 | 3000 | 0.8838 | 0.9321 | 0.5826 | 0.3264 | 0.6354 | 0.5148 |
726
+ | 0.5010 | 3100 | 0.9103 | - | - | - | - | - |
727
+ | 0.5171 | 3200 | 0.8586 | - | - | - | - | - |
728
+ | 0.5333 | 3300 | 0.9286 | - | - | - | - | - |
729
+ | 0.5495 | 3400 | 0.8645 | - | - | - | - | - |
730
+ | 0.5656 | 3500 | 0.9522 | 0.8105 | 0.6164 | 0.3378 | 0.6131 | 0.5224 |
731
+ | 0.5818 | 3600 | 0.8636 | - | - | - | - | - |
732
+ | 0.5979 | 3700 | 0.8634 | - | - | - | - | - |
733
+ | 0.6141 | 3800 | 0.8555 | - | - | - | - | - |
734
+ | 0.6303 | 3900 | 0.8447 | - | - | - | - | - |
735
+ | 0.6464 | 4000 | 0.8331 | 0.7699 | 0.6033 | 0.3442 | 0.6016 | 0.5164 |
736
+ | 0.6626 | 4100 | 0.8292 | - | - | - | - | - |
737
+ | 0.6787 | 4200 | 0.8273 | - | - | - | - | - |
738
+ | 0.6949 | 4300 | 0.8381 | - | - | - | - | - |
739
+ | 0.7111 | 4400 | 0.8035 | - | - | - | - | - |
740
+ | 0.7272 | 4500 | 0.8166 | 0.7743 | 0.6018 | 0.3394 | 0.6060 | 0.5157 |
741
+ | 0.7434 | 4600 | 0.8245 | - | - | - | - | - |
742
+ | 0.7595 | 4700 | 0.7831 | - | - | - | - | - |
743
+ | 0.7757 | 4800 | 0.8314 | - | - | - | - | - |
744
+ | 0.7919 | 4900 | 0.7994 | - | - | - | - | - |
745
+ | 0.8080 | 5000 | 0.8018 | 0.7058 | 0.6236 | 0.3413 | 0.6378 | 0.5342 |
746
+ | 0.8242 | 5100 | 0.7652 | - | - | - | - | - |
747
+ | 0.8403 | 5200 | 0.7458 | - | - | - | - | - |
748
+ | 0.8565 | 5300 | 0.8158 | - | - | - | - | - |
749
+ | 0.8727 | 5400 | 0.7887 | - | - | - | - | - |
750
+ | 0.8888 | 5500 | 0.7372 | 0.7389 | 0.6251 | 0.3476 | 0.6327 | 0.5351 |
751
+ | 0.9050 | 5600 | 0.8 | - | - | - | - | - |
752
+ | 0.9211 | 5700 | 0.7724 | - | - | - | - | - |
753
+ | 0.9373 | 5800 | 0.7578 | - | - | - | - | - |
754
+ | 0.9535 | 5900 | 0.7536 | - | - | - | - | - |
755
+ | 0.9696 | 6000 | 0.7982 | 0.7011 | 0.6289 | 0.3396 | 0.6308 | 0.5331 |
756
+ | 0.9858 | 6100 | 0.8084 | - | - | - | - | - |
757
+ | -1 | -1 | - | - | 0.6291 | 0.3390 | 0.6305 | 0.5329 |
758
+
759
+
760
+ ### Environmental Impact
761
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
762
+ - **Energy Consumed**: 0.201 kWh
763
+ - **Carbon Emitted**: 0.078 kg of CO2
764
+ - **Hours Used**: 0.571 hours
765
+
766
+ ### Training Hardware
767
+ - **On Cloud**: No
768
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
769
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
770
+ - **RAM Size**: 31.78 GB
771
+
772
+ ### Framework Versions
773
+ - Python: 3.11.6
774
+ - Sentence Transformers: 4.2.0.dev0
775
+ - Transformers: 4.52.4
776
+ - PyTorch: 2.7.1+cu126
777
+ - Accelerate: 1.5.1
778
+ - Datasets: 2.21.0
779
+ - Tokenizers: 0.21.1
780
+
781
+ ## Citation
782
+
783
+ ### BibTeX
784
+
785
+ #### Sentence Transformers
786
+ ```bibtex
787
+ @inproceedings{reimers-2019-sentence-bert,
788
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
789
+ author = "Reimers, Nils and Gurevych, Iryna",
790
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
791
+ month = "11",
792
+ year = "2019",
793
+ publisher = "Association for Computational Linguistics",
794
+ url = "https://arxiv.org/abs/1908.10084",
795
+ }
796
+ ```
797
+
798
+ #### SpladeLoss
799
+ ```bibtex
800
+ @misc{formal2022distillationhardnegativesampling,
801
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
802
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
803
+ year={2022},
804
+ eprint={2205.04733},
805
+ archivePrefix={arXiv},
806
+ primaryClass={cs.IR},
807
+ url={https://arxiv.org/abs/2205.04733},
808
+ }
809
+ ```
810
+
811
+ #### SparseDistillKLDivLoss
812
+ ```bibtex
813
+ @misc{lin2020distillingdenserepresentationsranking,
814
+ title={Distilling Dense Representations for Ranking using Tightly-Coupled Teachers},
815
+ author={Sheng-Chieh Lin and Jheng-Hong Yang and Jimmy Lin},
816
+ year={2020},
817
+ eprint={2010.11386},
818
+ archivePrefix={arXiv},
819
+ primaryClass={cs.IR},
820
+ url={https://arxiv.org/abs/2010.11386},
821
+ }
822
+ ```
823
+
824
+ #### FlopsLoss
825
+ ```bibtex
826
+ @article{paria2020minimizing,
827
+ title={Minimizing flops to learn efficient sparse representations},
828
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
829
+ journal={arXiv preprint arXiv:2004.05665},
830
+ year={2020}
831
+ }
832
+ ```
833
+
834
+ <!--
835
+ ## Glossary
836
+
837
+ *Clearly define terms in order to be accessible across audiences.*
838
+ -->
839
+
840
+ <!--
841
+ ## Model Card Authors
842
+
843
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
844
+ -->
845
+
846
+ <!--
847
+ ## Model Card Contact
848
+
849
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
850
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27c5dff4a5fc416ba99ed8bb6f31220ead270fa18a7d9ad8b88356b3a84b32a6
3
+ size 438080896
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff