tomaarsen HF Staff commited on
Commit
bf5dab0
·
verified ·
1 Parent(s): 08c56ed

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,837 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:99000
12
+ - loss:SpladeLoss
13
+ - loss:SparseDistillKLDivMarginMSELoss
14
+ - loss:FlopsLoss
15
+ base_model: Luyu/co-condenser-marco
16
+ widget:
17
+ - text: 'The ejection fraction may decrease if: 1 You have weakness of your heart
18
+ muscle, such as dilated cardiomyopathy, which can be caused by a heart muscle
19
+ problem, familial (genetic) cardiomyopathy, or systemic illnesses. 2 A heart
20
+ attack has damaged your heart. You have problems with your heart''s valves.'
21
+ - text: "One thing we avoided: Lots of alternative slime recipes swap Borax for liquid\
22
+ \ starch, shampoo, body wash, hand soap, contact lens solution, or laundry detergent.\
23
+ \ Those may seem benign â\x80\x94 and they might be â\x80\x94 but many of them\
24
+ \ contain derivatives or relatives of sodium borate too."
25
+ - text: how do i get my mvr in pa
26
+ - text: English is a language whose vocabulary is the composite of a surprising range
27
+ of influences. We have pillaged words from Latin, Greek, Dutch, Arabic, Old Norse,
28
+ Spanish, Italian, Hindi, and more besides to make English what it is today.
29
+ - text: Weed Eater was a string trimmer company founded in 1971 in Houston, Texas
30
+ by George C. Ballas, Sr. , the inventor of the device. The idea for the Weed Eater
31
+ trimmer came to him from the spinning nylon bristles of an automatic car wash.He
32
+ thought that he could come up with a similar technique to protect the bark on
33
+ trees that he was trimming around. His company was eventually bought by Emerson
34
+ Electric and merged with Poulan.Poulan/Weed Eater was later purchased by Electrolux,
35
+ which spun off the outdoors division as Husqvarna AB in 2006.Inventor Ballas was
36
+ the father of champion ballroom dancer Corky Ballas and the grandfather of Dancing
37
+ with the Stars dancer Mark Ballas.George Ballas died on June 25, 2011.he idea
38
+ for the Weed Eater trimmer came to him from the spinning nylon bristles of an
39
+ automatic car wash. He thought that he could come up with a similar technique
40
+ to protect the bark on trees that he was trimming around. His company was eventually
41
+ bought by Emerson Electric and merged with Poulan.
42
+ pipeline_tag: feature-extraction
43
+ library_name: sentence-transformers
44
+ metrics:
45
+ - dot_accuracy@1
46
+ - dot_accuracy@3
47
+ - dot_accuracy@5
48
+ - dot_accuracy@10
49
+ - dot_precision@1
50
+ - dot_precision@3
51
+ - dot_precision@5
52
+ - dot_precision@10
53
+ - dot_recall@1
54
+ - dot_recall@3
55
+ - dot_recall@5
56
+ - dot_recall@10
57
+ - dot_ndcg@10
58
+ - dot_mrr@10
59
+ - dot_map@100
60
+ - query_active_dims
61
+ - query_sparsity_ratio
62
+ - corpus_active_dims
63
+ - corpus_sparsity_ratio
64
+ co2_eq_emissions:
65
+ emissions: 76.94472370318051
66
+ energy_consumed: 0.19795299150295217
67
+ source: codecarbon
68
+ training_type: fine-tuning
69
+ on_cloud: false
70
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
71
+ ram_total_size: 31.777088165283203
72
+ hours_used: 0.562
73
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
74
+ model-index:
75
+ - name: CoCondenser finetuned on MS MARCO
76
+ results:
77
+ - task:
78
+ type: sparse-information-retrieval
79
+ name: Sparse Information Retrieval
80
+ dataset:
81
+ name: NanoMSMARCO
82
+ type: NanoMSMARCO
83
+ metrics:
84
+ - type: dot_accuracy@1
85
+ value: 0.36
86
+ name: Dot Accuracy@1
87
+ - type: dot_accuracy@3
88
+ value: 0.66
89
+ name: Dot Accuracy@3
90
+ - type: dot_accuracy@5
91
+ value: 0.72
92
+ name: Dot Accuracy@5
93
+ - type: dot_accuracy@10
94
+ value: 0.88
95
+ name: Dot Accuracy@10
96
+ - type: dot_precision@1
97
+ value: 0.36
98
+ name: Dot Precision@1
99
+ - type: dot_precision@3
100
+ value: 0.22
101
+ name: Dot Precision@3
102
+ - type: dot_precision@5
103
+ value: 0.14400000000000002
104
+ name: Dot Precision@5
105
+ - type: dot_precision@10
106
+ value: 0.088
107
+ name: Dot Precision@10
108
+ - type: dot_recall@1
109
+ value: 0.36
110
+ name: Dot Recall@1
111
+ - type: dot_recall@3
112
+ value: 0.66
113
+ name: Dot Recall@3
114
+ - type: dot_recall@5
115
+ value: 0.72
116
+ name: Dot Recall@5
117
+ - type: dot_recall@10
118
+ value: 0.88
119
+ name: Dot Recall@10
120
+ - type: dot_ndcg@10
121
+ value: 0.6103449967613165
122
+ name: Dot Ndcg@10
123
+ - type: dot_mrr@10
124
+ value: 0.5254444444444445
125
+ name: Dot Mrr@10
126
+ - type: dot_map@100
127
+ value: 0.5305650539391958
128
+ name: Dot Map@100
129
+ - type: query_active_dims
130
+ value: 27.84000015258789
131
+ name: Query Active Dims
132
+ - type: query_sparsity_ratio
133
+ value: 0.999087871038838
134
+ name: Query Sparsity Ratio
135
+ - type: corpus_active_dims
136
+ value: 312.8126220703125
137
+ name: Corpus Active Dims
138
+ - type: corpus_sparsity_ratio
139
+ value: 0.9897512410041835
140
+ name: Corpus Sparsity Ratio
141
+ - task:
142
+ type: sparse-information-retrieval
143
+ name: Sparse Information Retrieval
144
+ dataset:
145
+ name: NanoNFCorpus
146
+ type: NanoNFCorpus
147
+ metrics:
148
+ - type: dot_accuracy@1
149
+ value: 0.46
150
+ name: Dot Accuracy@1
151
+ - type: dot_accuracy@3
152
+ value: 0.58
153
+ name: Dot Accuracy@3
154
+ - type: dot_accuracy@5
155
+ value: 0.64
156
+ name: Dot Accuracy@5
157
+ - type: dot_accuracy@10
158
+ value: 0.66
159
+ name: Dot Accuracy@10
160
+ - type: dot_precision@1
161
+ value: 0.46
162
+ name: Dot Precision@1
163
+ - type: dot_precision@3
164
+ value: 0.38
165
+ name: Dot Precision@3
166
+ - type: dot_precision@5
167
+ value: 0.33599999999999997
168
+ name: Dot Precision@5
169
+ - type: dot_precision@10
170
+ value: 0.27
171
+ name: Dot Precision@10
172
+ - type: dot_recall@1
173
+ value: 0.0420665046255695
174
+ name: Dot Recall@1
175
+ - type: dot_recall@3
176
+ value: 0.07348579058152108
177
+ name: Dot Recall@3
178
+ - type: dot_recall@5
179
+ value: 0.11505423806680358
180
+ name: Dot Recall@5
181
+ - type: dot_recall@10
182
+ value: 0.14008288968118893
183
+ name: Dot Recall@10
184
+ - type: dot_ndcg@10
185
+ value: 0.34286821712769383
186
+ name: Dot Ndcg@10
187
+ - type: dot_mrr@10
188
+ value: 0.5336666666666667
189
+ name: Dot Mrr@10
190
+ - type: dot_map@100
191
+ value: 0.15026226864561068
192
+ name: Dot Map@100
193
+ - type: query_active_dims
194
+ value: 23.780000686645508
195
+ name: Query Active Dims
196
+ - type: query_sparsity_ratio
197
+ value: 0.9992208898274476
198
+ name: Query Sparsity Ratio
199
+ - type: corpus_active_dims
200
+ value: 551.4070434570312
201
+ name: Corpus Active Dims
202
+ - type: corpus_sparsity_ratio
203
+ value: 0.9819341116749547
204
+ name: Corpus Sparsity Ratio
205
+ - task:
206
+ type: sparse-information-retrieval
207
+ name: Sparse Information Retrieval
208
+ dataset:
209
+ name: NanoNQ
210
+ type: NanoNQ
211
+ metrics:
212
+ - type: dot_accuracy@1
213
+ value: 0.46
214
+ name: Dot Accuracy@1
215
+ - type: dot_accuracy@3
216
+ value: 0.72
217
+ name: Dot Accuracy@3
218
+ - type: dot_accuracy@5
219
+ value: 0.76
220
+ name: Dot Accuracy@5
221
+ - type: dot_accuracy@10
222
+ value: 0.84
223
+ name: Dot Accuracy@10
224
+ - type: dot_precision@1
225
+ value: 0.46
226
+ name: Dot Precision@1
227
+ - type: dot_precision@3
228
+ value: 0.24
229
+ name: Dot Precision@3
230
+ - type: dot_precision@5
231
+ value: 0.16
232
+ name: Dot Precision@5
233
+ - type: dot_precision@10
234
+ value: 0.08999999999999998
235
+ name: Dot Precision@10
236
+ - type: dot_recall@1
237
+ value: 0.43
238
+ name: Dot Recall@1
239
+ - type: dot_recall@3
240
+ value: 0.67
241
+ name: Dot Recall@3
242
+ - type: dot_recall@5
243
+ value: 0.72
244
+ name: Dot Recall@5
245
+ - type: dot_recall@10
246
+ value: 0.8
247
+ name: Dot Recall@10
248
+ - type: dot_ndcg@10
249
+ value: 0.6351775821531778
250
+ name: Dot Ndcg@10
251
+ - type: dot_mrr@10
252
+ value: 0.5995238095238096
253
+ name: Dot Mrr@10
254
+ - type: dot_map@100
255
+ value: 0.5784818757304737
256
+ name: Dot Map@100
257
+ - type: query_active_dims
258
+ value: 31.780000686645508
259
+ name: Query Active Dims
260
+ - type: query_sparsity_ratio
261
+ value: 0.9989587838055617
262
+ name: Query Sparsity Ratio
263
+ - type: corpus_active_dims
264
+ value: 343.7594909667969
265
+ name: Corpus Active Dims
266
+ - type: corpus_sparsity_ratio
267
+ value: 0.9887373209171484
268
+ name: Corpus Sparsity Ratio
269
+ - task:
270
+ type: sparse-nano-beir
271
+ name: Sparse Nano BEIR
272
+ dataset:
273
+ name: NanoBEIR mean
274
+ type: NanoBEIR_mean
275
+ metrics:
276
+ - type: dot_accuracy@1
277
+ value: 0.4266666666666667
278
+ name: Dot Accuracy@1
279
+ - type: dot_accuracy@3
280
+ value: 0.6533333333333333
281
+ name: Dot Accuracy@3
282
+ - type: dot_accuracy@5
283
+ value: 0.7066666666666667
284
+ name: Dot Accuracy@5
285
+ - type: dot_accuracy@10
286
+ value: 0.7933333333333333
287
+ name: Dot Accuracy@10
288
+ - type: dot_precision@1
289
+ value: 0.4266666666666667
290
+ name: Dot Precision@1
291
+ - type: dot_precision@3
292
+ value: 0.27999999999999997
293
+ name: Dot Precision@3
294
+ - type: dot_precision@5
295
+ value: 0.21333333333333335
296
+ name: Dot Precision@5
297
+ - type: dot_precision@10
298
+ value: 0.14933333333333332
299
+ name: Dot Precision@10
300
+ - type: dot_recall@1
301
+ value: 0.2773555015418565
302
+ name: Dot Recall@1
303
+ - type: dot_recall@3
304
+ value: 0.467828596860507
305
+ name: Dot Recall@3
306
+ - type: dot_recall@5
307
+ value: 0.5183514126889345
308
+ name: Dot Recall@5
309
+ - type: dot_recall@10
310
+ value: 0.6066942965603963
311
+ name: Dot Recall@10
312
+ - type: dot_ndcg@10
313
+ value: 0.5294635986807293
314
+ name: Dot Ndcg@10
315
+ - type: dot_mrr@10
316
+ value: 0.5528783068783069
317
+ name: Dot Mrr@10
318
+ - type: dot_map@100
319
+ value: 0.41976973277176005
320
+ name: Dot Map@100
321
+ - type: query_active_dims
322
+ value: 27.8000005086263
323
+ name: Query Active Dims
324
+ - type: query_sparsity_ratio
325
+ value: 0.9990891815572824
326
+ name: Query Sparsity Ratio
327
+ - type: corpus_active_dims
328
+ value: 378.8387759532669
329
+ name: Corpus Active Dims
330
+ - type: corpus_sparsity_ratio
331
+ value: 0.987588009437348
332
+ name: Corpus Sparsity Ratio
333
+ ---
334
+
335
+ # CoCondenser finetuned on MS MARCO
336
+
337
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
338
+ ## Model Details
339
+
340
+ ### Model Description
341
+ - **Model Type:** SPLADE Sparse Encoder
342
+ - **Base model:** [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) <!-- at revision e0cef0ab2410aae0f0994366ddefb5649a266709 -->
343
+ - **Maximum Sequence Length:** 512 tokens
344
+ - **Output Dimensionality:** 30522 dimensions
345
+ - **Similarity Function:** Dot Product
346
+ <!-- - **Training Dataset:** Unknown -->
347
+ - **Language:** en
348
+ - **License:** apache-2.0
349
+
350
+ ### Model Sources
351
+
352
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
353
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
354
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
355
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
356
+
357
+ ### Full Model Architecture
358
+
359
+ ```
360
+ SparseEncoder(
361
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
362
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
363
+ )
364
+ ```
365
+
366
+ ## Usage
367
+
368
+ ### Direct Usage (Sentence Transformers)
369
+
370
+ First install the Sentence Transformers library:
371
+
372
+ ```bash
373
+ pip install -U sentence-transformers
374
+ ```
375
+
376
+ Then you can load this model and run inference.
377
+ ```python
378
+ from sentence_transformers import SparseEncoder
379
+
380
+ # Download from the 🤗 Hub
381
+ model = SparseEncoder("tomaarsen/splade-cocondenser-msmarco-kldiv-marginmse-minilm-temp-4")
382
+ # Run inference
383
+ queries = [
384
+ "who started gladiator lacrosse",
385
+ ]
386
+ documents = [
387
+ 'Weed Eater was a string trimmer company founded in 1971 in Houston, Texas by George C. Ballas, Sr. , the inventor of the device. The idea for the Weed Eater trimmer came to him from the spinning nylon bristles of an automatic car wash.He thought that he could come up with a similar technique to protect the bark on trees that he was trimming around. His company was eventually bought by Emerson Electric and merged with Poulan.Poulan/Weed Eater was later purchased by Electrolux, which spun off the outdoors division as Husqvarna AB in 2006.Inventor Ballas was the father of champion ballroom dancer Corky Ballas and the grandfather of Dancing with the Stars dancer Mark Ballas.George Ballas died on June 25, 2011.he idea for the Weed Eater trimmer came to him from the spinning nylon bristles of an automatic car wash. He thought that he could come up with a similar technique to protect the bark on trees that he was trimming around. His company was eventually bought by Emerson Electric and merged with Poulan.',
388
+ "The earliest types of gladiator were named after Rome's enemies of that time: the Samnite, Thracian and Gaul. The Samnite, heavily armed, elegantly helmed and probably the most popular type, was renamed Secutor and the Gaul renamed Murmillo, once these former enemies had been conquered then absorbed into Rome's Empire.",
389
+ 'Summit Hill, PA. Sponsored Topics. Summit Hill is a borough in Carbon County, Pennsylvania, United States. The population was 2,974 at the 2000 census. Summit Hill is located at 40°49â\x80²39â\x80³N 75°51â\x80²57â\x80³W / 40.8275°N 75.86583°W / 40.8275; -75.86583 (40.827420, -75.865892).',
390
+ ]
391
+ query_embeddings = model.encode_query(queries)
392
+ document_embeddings = model.encode_document(documents)
393
+ print(query_embeddings.shape, document_embeddings.shape)
394
+ # [1, 30522] [3, 30522]
395
+
396
+ # Get the similarity scores for the embeddings
397
+ similarities = model.similarity(query_embeddings, document_embeddings)
398
+ print(similarities)
399
+ # tensor([[25.0410, 38.7668, 21.0289]])
400
+ ```
401
+
402
+ <!--
403
+ ### Direct Usage (Transformers)
404
+
405
+ <details><summary>Click to see the direct usage in Transformers</summary>
406
+
407
+ </details>
408
+ -->
409
+
410
+ <!--
411
+ ### Downstream Usage (Sentence Transformers)
412
+
413
+ You can finetune this model on your own dataset.
414
+
415
+ <details><summary>Click to expand</summary>
416
+
417
+ </details>
418
+ -->
419
+
420
+ <!--
421
+ ### Out-of-Scope Use
422
+
423
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
424
+ -->
425
+
426
+ ## Evaluation
427
+
428
+ ### Metrics
429
+
430
+ #### Sparse Information Retrieval
431
+
432
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
433
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
434
+
435
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
436
+ |:----------------------|:------------|:-------------|:-----------|
437
+ | dot_accuracy@1 | 0.36 | 0.46 | 0.46 |
438
+ | dot_accuracy@3 | 0.66 | 0.58 | 0.72 |
439
+ | dot_accuracy@5 | 0.72 | 0.64 | 0.76 |
440
+ | dot_accuracy@10 | 0.88 | 0.66 | 0.84 |
441
+ | dot_precision@1 | 0.36 | 0.46 | 0.46 |
442
+ | dot_precision@3 | 0.22 | 0.38 | 0.24 |
443
+ | dot_precision@5 | 0.144 | 0.336 | 0.16 |
444
+ | dot_precision@10 | 0.088 | 0.27 | 0.09 |
445
+ | dot_recall@1 | 0.36 | 0.0421 | 0.43 |
446
+ | dot_recall@3 | 0.66 | 0.0735 | 0.67 |
447
+ | dot_recall@5 | 0.72 | 0.1151 | 0.72 |
448
+ | dot_recall@10 | 0.88 | 0.1401 | 0.8 |
449
+ | **dot_ndcg@10** | **0.6103** | **0.3429** | **0.6352** |
450
+ | dot_mrr@10 | 0.5254 | 0.5337 | 0.5995 |
451
+ | dot_map@100 | 0.5306 | 0.1503 | 0.5785 |
452
+ | query_active_dims | 27.84 | 23.78 | 31.78 |
453
+ | query_sparsity_ratio | 0.9991 | 0.9992 | 0.999 |
454
+ | corpus_active_dims | 312.8126 | 551.407 | 343.7595 |
455
+ | corpus_sparsity_ratio | 0.9898 | 0.9819 | 0.9887 |
456
+
457
+ #### Sparse Nano BEIR
458
+
459
+ * Dataset: `NanoBEIR_mean`
460
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
461
+ ```json
462
+ {
463
+ "dataset_names": [
464
+ "msmarco",
465
+ "nfcorpus",
466
+ "nq"
467
+ ]
468
+ }
469
+ ```
470
+
471
+ | Metric | Value |
472
+ |:----------------------|:-----------|
473
+ | dot_accuracy@1 | 0.4267 |
474
+ | dot_accuracy@3 | 0.6533 |
475
+ | dot_accuracy@5 | 0.7067 |
476
+ | dot_accuracy@10 | 0.7933 |
477
+ | dot_precision@1 | 0.4267 |
478
+ | dot_precision@3 | 0.28 |
479
+ | dot_precision@5 | 0.2133 |
480
+ | dot_precision@10 | 0.1493 |
481
+ | dot_recall@1 | 0.2774 |
482
+ | dot_recall@3 | 0.4678 |
483
+ | dot_recall@5 | 0.5184 |
484
+ | dot_recall@10 | 0.6067 |
485
+ | **dot_ndcg@10** | **0.5295** |
486
+ | dot_mrr@10 | 0.5529 |
487
+ | dot_map@100 | 0.4198 |
488
+ | query_active_dims | 27.8 |
489
+ | query_sparsity_ratio | 0.9991 |
490
+ | corpus_active_dims | 378.8388 |
491
+ | corpus_sparsity_ratio | 0.9876 |
492
+
493
+ <!--
494
+ ## Bias, Risks and Limitations
495
+
496
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
497
+ -->
498
+
499
+ <!--
500
+ ### Recommendations
501
+
502
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
503
+ -->
504
+
505
+ ## Training Details
506
+
507
+ ### Training Dataset
508
+
509
+ #### Unnamed Dataset
510
+
511
+ * Size: 99,000 training samples
512
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
513
+ * Approximate statistics based on the first 1000 samples:
514
+ | | query | positive | negative | label |
515
+ |:--------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
516
+ | type | string | string | string | list |
517
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.2 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.86 tokens</li><li>max: 219 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.96 tokens</li><li>max: 270 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
518
+ * Samples:
519
+ | query | positive | negative | label |
520
+ |:---------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------|
521
+ | <code>rtn tv network</code> | <code>Home Shopping Network. Home Shopping Network (HSN) is an American broadcast, basic cable and satellite television network that is owned by HSN, Inc. (NASDAQ: HSNI), which also owns catalog company Cornerstone Brands. Based in St. Petersburg, Florida, United States, the home shopping channel has former and current sister channels in several other countries.</code> | <code>The Public Switched Telephone Network - The public switched telephone network (PSTN) is the international network of circuit-switched telephones. Learn more about PSTN at HowStuffWorks. x</code> | <code>[-1.0804121494293213, -5.908488750457764]</code> |
522
+ | <code>how did president nixon react to the watergate investigation?</code> | <code>The Watergate scandal was a major political scandal that occurred in the United States during the early 1970s, following a break-in by five men at the Democratic National Committee headquarters at the Watergate office complex in Washington, D.C. on June 17, 1972, and President Richard Nixon's administration's subsequent attempt to cover up its involvement. After the five burglars were caught and the conspiracy was discovered, Watergate was investigated by the United States Congress. Meanwhile, N</code> | <code>The release of the tape was ordered by the Supreme Court on July 24, 1974, in a case known as United States v. Nixon. The court’s decision was unanimous. President Nixon released the tape on August 5. It was one of three conversations he had with Haldeman six days after the Watergate break-in. The tapes prove that he ordered a cover-up of the Watergate burglary. The Smoking Gun tape reveals that Nixon ordered the FBI to abandon its investigation of the break-in. [Read more…]</code> | <code>[4.117279052734375, 3.191757917404175]</code> |
523
+ | <code>what is a summary offense in pennsylvania</code> | <code>We provide cost effective house arrest and electronic monitoring services to magisterial district court systems throughout Pennsylvania including York, Harrisburg, Philadelphia and Allentown.In addition, we also serve the York County, Lancaster County and Chester County.e provide cost effective house arrest and electronic monitoring services to magisterial district court systems throughout Pennsylvania including York, Harrisburg, Philadelphia and Allentown.</code> | <code>In order to be convicted of Simple Assault, one must cause bodily injury. To be convicted of Aggravated Assault, one must cause serious bodily injury. From my research, Pennsylvania law defines bodily injury as the impairment of physical condition or substantial pain.</code> | <code>[-8.954689025878906, -1.3361705541610718]</code> |
524
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
525
+ ```json
526
+ {
527
+ "loss": "SparseDistillKLDivMarginMSELoss",
528
+ "lambda_corpus": 0.0005,
529
+ "lambda_query": 0.0005
530
+ }
531
+ ```
532
+
533
+ ### Evaluation Dataset
534
+
535
+ #### Unnamed Dataset
536
+
537
+ * Size: 1,000 evaluation samples
538
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
539
+ * Approximate statistics based on the first 1000 samples:
540
+ | | query | positive | negative | label |
541
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
542
+ | type | string | string | string | list |
543
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.12 tokens</li><li>max: 37 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 78.91 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>min: 25 tokens</li><li>mean: 81.25 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
544
+ * Samples:
545
+ | query | positive | negative | label |
546
+ |:-----------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------|
547
+ | <code>how long to cook roast beef for</code> | <code>Roasting times for beef. Preheat your oven to 160°C (325°F) and use these cooking times to prepare a roast that's moist, tender and delicious. Your roast should be covered with foil for the first half of the roasting time to prevent drying the outer layer.3 to 5lb Joint 1½ to 2 hours.reheat your oven to 160°C (325°F) and use these cooking times to prepare a roast that's moist, tender and delicious. Your roast should be covered with foil for the first half of the roasting time to prevent drying the outer layer.</code> | <code>Estimating Cooking Time for Large Beef Roasts. If you roast at a steady 325F (160C), subtract 2 minutes or so per pound. If the roast is refrigerated just before going into the oven, add 2 or 3 minutes per pound. WARNING NOTES: Remember, the rib roast will continue to cook as it sets.</code> | <code>[6.501978874206543, 8.214995384216309]</code> |
548
+ | <code>definition of fire inspection</code> | <code>Learn how to do a monthly fire extinguisher inspection in your workplace. Departments must assign an individual to inspect monthly the extinguishers in or adjacent to the department's facilities.1 Read Fire Extinguisher Types and Maintenance for more information.earn how to do a monthly fire extinguisher inspection in your workplace. Departments must assign an individual to inspect monthly the extinguishers in or adjacent to the department's facilities.</code> | <code>reconnaissance by fire-a method of reconnaissance in which fire is placed on a suspected enemy position in order to cause the enemy to disclose his presence by moving or returning fire. reconnaissance in force-an offensive operation designed to discover or test the enemy's strength (or to obtain other information). mission undertaken to obtain, by visual observation or other detection methods, information about the activities and resources of an enemy or potential enemy, or to secure data concerning the meteorological, hydrographic, or geographic characteristics of a particular area.</code> | <code>[-0.38299351930618286, -0.9372650384902954]</code> |
549
+ | <code>how many stores does family dollar have</code> | <code>Property Spotlight: New Retail Center at Hamilton & Warner - Outlots Available!! Family Dollar is closing stores following a disappointing second quarter. Family Dollar Stores Inc. won’t just be cutting prices in an attempt to boost its business – it’ll be closing stores as well. The Matthews, N.C.-based discount retailer plans to shutter 370 under-performing shops, according to the Charlotte Business Journal.</code> | <code>Glassdoor has 1,976 Family Dollar Stores reviews submitted anonymously by Family Dollar Stores employees. Read employee reviews and ratings on Glassdoor to decide if Family Dollar Stores is right for you.</code> | <code>[4.726407527923584, 8.284608840942383]</code> |
550
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
551
+ ```json
552
+ {
553
+ "loss": "SparseDistillKLDivMarginMSELoss",
554
+ "lambda_corpus": 0.0005,
555
+ "lambda_query": 0.0005
556
+ }
557
+ ```
558
+
559
+ ### Training Hyperparameters
560
+ #### Non-Default Hyperparameters
561
+
562
+ - `eval_strategy`: steps
563
+ - `per_device_train_batch_size`: 16
564
+ - `per_device_eval_batch_size`: 16
565
+ - `learning_rate`: 2e-05
566
+ - `num_train_epochs`: 1
567
+ - `warmup_ratio`: 0.1
568
+ - `fp16`: True
569
+ - `batch_sampler`: no_duplicates
570
+
571
+ #### All Hyperparameters
572
+ <details><summary>Click to expand</summary>
573
+
574
+ - `overwrite_output_dir`: False
575
+ - `do_predict`: False
576
+ - `eval_strategy`: steps
577
+ - `prediction_loss_only`: True
578
+ - `per_device_train_batch_size`: 16
579
+ - `per_device_eval_batch_size`: 16
580
+ - `per_gpu_train_batch_size`: None
581
+ - `per_gpu_eval_batch_size`: None
582
+ - `gradient_accumulation_steps`: 1
583
+ - `eval_accumulation_steps`: None
584
+ - `torch_empty_cache_steps`: None
585
+ - `learning_rate`: 2e-05
586
+ - `weight_decay`: 0.0
587
+ - `adam_beta1`: 0.9
588
+ - `adam_beta2`: 0.999
589
+ - `adam_epsilon`: 1e-08
590
+ - `max_grad_norm`: 1.0
591
+ - `num_train_epochs`: 1
592
+ - `max_steps`: -1
593
+ - `lr_scheduler_type`: linear
594
+ - `lr_scheduler_kwargs`: {}
595
+ - `warmup_ratio`: 0.1
596
+ - `warmup_steps`: 0
597
+ - `log_level`: passive
598
+ - `log_level_replica`: warning
599
+ - `log_on_each_node`: True
600
+ - `logging_nan_inf_filter`: True
601
+ - `save_safetensors`: True
602
+ - `save_on_each_node`: False
603
+ - `save_only_model`: False
604
+ - `restore_callback_states_from_checkpoint`: False
605
+ - `no_cuda`: False
606
+ - `use_cpu`: False
607
+ - `use_mps_device`: False
608
+ - `seed`: 42
609
+ - `data_seed`: None
610
+ - `jit_mode_eval`: False
611
+ - `use_ipex`: False
612
+ - `bf16`: False
613
+ - `fp16`: True
614
+ - `fp16_opt_level`: O1
615
+ - `half_precision_backend`: auto
616
+ - `bf16_full_eval`: False
617
+ - `fp16_full_eval`: False
618
+ - `tf32`: None
619
+ - `local_rank`: 0
620
+ - `ddp_backend`: None
621
+ - `tpu_num_cores`: None
622
+ - `tpu_metrics_debug`: False
623
+ - `debug`: []
624
+ - `dataloader_drop_last`: False
625
+ - `dataloader_num_workers`: 0
626
+ - `dataloader_prefetch_factor`: None
627
+ - `past_index`: -1
628
+ - `disable_tqdm`: False
629
+ - `remove_unused_columns`: True
630
+ - `label_names`: None
631
+ - `load_best_model_at_end`: False
632
+ - `ignore_data_skip`: False
633
+ - `fsdp`: []
634
+ - `fsdp_min_num_params`: 0
635
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
636
+ - `fsdp_transformer_layer_cls_to_wrap`: None
637
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
638
+ - `deepspeed`: None
639
+ - `label_smoothing_factor`: 0.0
640
+ - `optim`: adamw_torch
641
+ - `optim_args`: None
642
+ - `adafactor`: False
643
+ - `group_by_length`: False
644
+ - `length_column_name`: length
645
+ - `ddp_find_unused_parameters`: None
646
+ - `ddp_bucket_cap_mb`: None
647
+ - `ddp_broadcast_buffers`: False
648
+ - `dataloader_pin_memory`: True
649
+ - `dataloader_persistent_workers`: False
650
+ - `skip_memory_metrics`: True
651
+ - `use_legacy_prediction_loop`: False
652
+ - `push_to_hub`: False
653
+ - `resume_from_checkpoint`: None
654
+ - `hub_model_id`: None
655
+ - `hub_strategy`: every_save
656
+ - `hub_private_repo`: None
657
+ - `hub_always_push`: False
658
+ - `gradient_checkpointing`: False
659
+ - `gradient_checkpointing_kwargs`: None
660
+ - `include_inputs_for_metrics`: False
661
+ - `include_for_metrics`: []
662
+ - `eval_do_concat_batches`: True
663
+ - `fp16_backend`: auto
664
+ - `push_to_hub_model_id`: None
665
+ - `push_to_hub_organization`: None
666
+ - `mp_parameters`:
667
+ - `auto_find_batch_size`: False
668
+ - `full_determinism`: False
669
+ - `torchdynamo`: None
670
+ - `ray_scope`: last
671
+ - `ddp_timeout`: 1800
672
+ - `torch_compile`: False
673
+ - `torch_compile_backend`: None
674
+ - `torch_compile_mode`: None
675
+ - `include_tokens_per_second`: False
676
+ - `include_num_input_tokens_seen`: False
677
+ - `neftune_noise_alpha`: None
678
+ - `optim_target_modules`: None
679
+ - `batch_eval_metrics`: False
680
+ - `eval_on_start`: False
681
+ - `use_liger_kernel`: False
682
+ - `eval_use_gather_object`: False
683
+ - `average_tokens_across_devices`: False
684
+ - `prompts`: None
685
+ - `batch_sampler`: no_duplicates
686
+ - `multi_dataset_batch_sampler`: proportional
687
+ - `router_mapping`: {}
688
+ - `learning_rate_mapping`: {}
689
+
690
+ </details>
691
+
692
+ ### Training Logs
693
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
694
+ |:------:|:----:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|
695
+ | -1 | -1 | - | - | 0.0823 | 0.0412 | 0.0621 | 0.0619 |
696
+ | 0.0162 | 100 | 24464.415 | - | - | - | - | - |
697
+ | 0.0323 | 200 | 561.543 | - | - | - | - | - |
698
+ | 0.0485 | 300 | 4.9817 | - | - | - | - | - |
699
+ | 0.0646 | 400 | 3.1592 | - | - | - | - | - |
700
+ | 0.0808 | 500 | 2.9133 | 2.8812 | 0.1731 | 0.1016 | 0.1995 | 0.1581 |
701
+ | 0.0970 | 600 | 2.7573 | - | - | - | - | - |
702
+ | 0.1131 | 700 | 2.5302 | - | - | - | - | - |
703
+ | 0.1293 | 800 | 2.2267 | - | - | - | - | - |
704
+ | 0.1454 | 900 | 2.4419 | - | - | - | - | - |
705
+ | 0.1616 | 1000 | 2.2217 | 1.8423 | 0.5140 | 0.3222 | 0.5659 | 0.4674 |
706
+ | 0.1778 | 1100 | 1.9851 | - | - | - | - | - |
707
+ | 0.1939 | 1200 | 2.0402 | - | - | - | - | - |
708
+ | 0.2101 | 1300 | 1.8291 | - | - | - | - | - |
709
+ | 0.2262 | 1400 | 1.816 | - | - | - | - | - |
710
+ | 0.2424 | 1500 | 1.8955 | 1.6957 | 0.5223 | 0.3237 | 0.5930 | 0.4797 |
711
+ | 0.2586 | 1600 | 1.7903 | - | - | - | - | - |
712
+ | 0.2747 | 1700 | 1.7841 | - | - | - | - | - |
713
+ | 0.2909 | 1800 | 1.7548 | - | - | - | - | - |
714
+ | 0.3070 | 1900 | 1.7477 | - | - | - | - | - |
715
+ | 0.3232 | 2000 | 1.6918 | 1.5119 | 0.5727 | 0.3464 | 0.6342 | 0.5178 |
716
+ | 0.3394 | 2100 | 1.5736 | - | - | - | - | - |
717
+ | 0.3555 | 2200 | 1.5365 | - | - | - | - | - |
718
+ | 0.3717 | 2300 | 1.6309 | - | - | - | - | - |
719
+ | 0.3878 | 2400 | 1.531 | - | - | - | - | - |
720
+ | 0.4040 | 2500 | 1.6303 | 1.4216 | 0.5803 | 0.3340 | 0.6155 | 0.5099 |
721
+ | 0.4202 | 2600 | 1.5792 | - | - | - | - | - |
722
+ | 0.4363 | 2700 | 1.5035 | - | - | - | - | - |
723
+ | 0.4525 | 2800 | 1.5344 | - | - | - | - | - |
724
+ | 0.4686 | 2900 | 1.3434 | - | - | - | - | - |
725
+ | 0.4848 | 3000 | 1.3977 | 1.4396 | 0.5915 | 0.3539 | 0.6606 | 0.5353 |
726
+ | 0.5010 | 3100 | 1.4321 | - | - | - | - | - |
727
+ | 0.5171 | 3200 | 1.3259 | - | - | - | - | - |
728
+ | 0.5333 | 3300 | 1.4871 | - | - | - | - | - |
729
+ | 0.5495 | 3400 | 1.4436 | - | - | - | - | - |
730
+ | 0.5656 | 3500 | 1.5346 | 1.3088 | 0.6010 | 0.3261 | 0.6586 | 0.5286 |
731
+ | 0.5818 | 3600 | 1.3722 | - | - | - | - | - |
732
+ | 0.5979 | 3700 | 1.4081 | - | - | - | - | - |
733
+ | 0.6141 | 3800 | 1.3011 | - | - | - | - | - |
734
+ | 0.6303 | 3900 | 1.2841 | - | - | - | - | - |
735
+ | 0.6464 | 4000 | 1.2834 | 1.3057 | 0.6106 | 0.3372 | 0.6381 | 0.5286 |
736
+ | 0.6626 | 4100 | 1.3188 | - | - | - | - | - |
737
+ | 0.6787 | 4200 | 1.2799 | - | - | - | - | - |
738
+ | 0.6949 | 4300 | 1.2927 | - | - | - | - | - |
739
+ | 0.7111 | 4400 | 1.2788 | - | - | - | - | - |
740
+ | 0.7272 | 4500 | 1.3151 | 1.2213 | 0.6068 | 0.3401 | 0.6306 | 0.5258 |
741
+ | 0.7434 | 4600 | 1.2763 | - | - | - | - | - |
742
+ | 0.7595 | 4700 | 1.1939 | - | - | - | - | - |
743
+ | 0.7757 | 4800 | 1.2794 | - | - | - | - | - |
744
+ | 0.7919 | 4900 | 1.2333 | - | - | - | - | - |
745
+ | 0.8080 | 5000 | 1.2885 | 1.1260 | 0.6069 | 0.3455 | 0.6546 | 0.5357 |
746
+ | 0.8242 | 5100 | 1.2065 | - | - | - | - | - |
747
+ | 0.8403 | 5200 | 1.173 | - | - | - | - | - |
748
+ | 0.8565 | 5300 | 1.2849 | - | - | - | - | - |
749
+ | 0.8727 | 5400 | 1.1891 | - | - | - | - | - |
750
+ | 0.8888 | 5500 | 1.1292 | 1.1857 | 0.6172 | 0.3619 | 0.6388 | 0.5393 |
751
+ | 0.9050 | 5600 | 1.2337 | - | - | - | - | - |
752
+ | 0.9211 | 5700 | 1.1533 | - | - | - | - | - |
753
+ | 0.9373 | 5800 | 1.1776 | - | - | - | - | - |
754
+ | 0.9535 | 5900 | 1.1447 | - | - | - | - | - |
755
+ | 0.9696 | 6000 | 1.1829 | 1.1114 | 0.6130 | 0.3436 | 0.6261 | 0.5276 |
756
+ | 0.9858 | 6100 | 1.2754 | - | - | - | - | - |
757
+ | -1 | -1 | - | - | 0.6103 | 0.3429 | 0.6352 | 0.5295 |
758
+
759
+
760
+ ### Environmental Impact
761
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
762
+ - **Energy Consumed**: 0.198 kWh
763
+ - **Carbon Emitted**: 0.077 kg of CO2
764
+ - **Hours Used**: 0.562 hours
765
+
766
+ ### Training Hardware
767
+ - **On Cloud**: No
768
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
769
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
770
+ - **RAM Size**: 31.78 GB
771
+
772
+ ### Framework Versions
773
+ - Python: 3.11.6
774
+ - Sentence Transformers: 4.2.0.dev0
775
+ - Transformers: 4.52.4
776
+ - PyTorch: 2.7.1+cu126
777
+ - Accelerate: 1.5.1
778
+ - Datasets: 2.21.0
779
+ - Tokenizers: 0.21.1
780
+
781
+ ## Citation
782
+
783
+ ### BibTeX
784
+
785
+ #### Sentence Transformers
786
+ ```bibtex
787
+ @inproceedings{reimers-2019-sentence-bert,
788
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
789
+ author = "Reimers, Nils and Gurevych, Iryna",
790
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
791
+ month = "11",
792
+ year = "2019",
793
+ publisher = "Association for Computational Linguistics",
794
+ url = "https://arxiv.org/abs/1908.10084",
795
+ }
796
+ ```
797
+
798
+ #### SpladeLoss
799
+ ```bibtex
800
+ @misc{formal2022distillationhardnegativesampling,
801
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
802
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
803
+ year={2022},
804
+ eprint={2205.04733},
805
+ archivePrefix={arXiv},
806
+ primaryClass={cs.IR},
807
+ url={https://arxiv.org/abs/2205.04733},
808
+ }
809
+ ```
810
+
811
+ #### FlopsLoss
812
+ ```bibtex
813
+ @article{paria2020minimizing,
814
+ title={Minimizing flops to learn efficient sparse representations},
815
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
816
+ journal={arXiv preprint arXiv:2004.05665},
817
+ year={2020}
818
+ }
819
+ ```
820
+
821
+ <!--
822
+ ## Glossary
823
+
824
+ *Clearly define terms in order to be accessible across audiences.*
825
+ -->
826
+
827
+ <!--
828
+ ## Model Card Authors
829
+
830
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
831
+ -->
832
+
833
+ <!--
834
+ ## Model Card Contact
835
+
836
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
837
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6dd6b84f6676302aae58c22aa17008b4d8f454053b1d72f8232e2cc5ef32fccc
3
+ size 438080896
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff