tomaarsen HF Staff commited on
Commit
f75f9d8
·
verified ·
1 Parent(s): 0224084

Add new SparseEncoder model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_CSRSparsity/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_dim": 1024,
3
+ "hidden_dim": 4096,
4
+ "k": 256,
5
+ "k_aux": 512,
6
+ "normalize": false,
7
+ "dead_threshold": 30
8
+ }
2_CSRSparsity/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbe29eba6d8321d8badbd9c18d1f92dff5141f253a62846c85005faefb04cb62
3
+ size 16830864
README.md ADDED
@@ -0,0 +1,1306 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - csr
10
+ - generated_from_trainer
11
+ - dataset_size:3011496
12
+ - loss:CSRLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ base_model: mixedbread-ai/mxbai-embed-large-v1
15
+ widget:
16
+ - source_sentence: how much is a car title transfer in minnesota?
17
+ sentences:
18
+ - This complex is a larger molecule than the original crystal violet stain and iodine
19
+ and is insoluble in water. ... Conversely, the the outer membrane of Gram negative
20
+ bacteria is degraded and the thinner peptidoglycan layer of Gram negative cells
21
+ is unable to retain the crystal violet-iodine complex and the color is lost.
22
+ - Get insurance on the car and provide proof. Bring this information (including
23
+ the title) to the Minnesota DVS office, as well as $10 for the filing fee and
24
+ $7.25 for the titling fee. There is also a $10 transfer tax, as well as a 6.5%
25
+ sales tax on the purchase price.
26
+ - 'One of the risks of DNP is that it accelerates the metabolism to a dangerously
27
+ fast level. Our metabolic system operates at the rate it does for a reason – it
28
+ is safe. Speeding up the metabolism may help burn off fat, but it can also trigger
29
+ a number of potentially dangerous side effects, such as: fever.'
30
+ - source_sentence: what is the difference between 18 and 20 inch tires?
31
+ sentences:
32
+ - The only real difference is a 20" rim would be more likely to be damaged, as you
33
+ pointed out. Beyond looks, there is zero benefit for the 20" rim. Also, just the
34
+ availability of tires will likely be much more limited for the larger rim. ...
35
+ Tire selection is better for 18" wheels than 20" wheels.
36
+ - '[''Open your Outlook app on your mobile device and click on the Settings gear
37
+ icon.'', ''Under Settings, click on the Signature option.'', ''Enter either a
38
+ generic signature that could be used for all email accounts tied to your Outlook
39
+ app, or a specific signature, Per Account Signature, for each email account.'']'
40
+ - The average normal body temperature is around 98.6 degrees Fahrenheit, or 37 degrees
41
+ Celsius. If your body temperature drops to just a few degrees lower than this,
42
+ your blood vessels in your hands, feet, arms, and legs start to get narrower.
43
+ - source_sentence: whom the bell tolls meaning?
44
+ sentences:
45
+ - 'Answer: Humans are depicted in Hindu art often in sensuous and erotic postures.'
46
+ - The phrase "For whom the bell tolls" refers to the church bells that are rung
47
+ when a person dies. Hence, the author is suggesting that we should not be curious
48
+ as to for whom the church bell is tolling for. It is for all of us.
49
+ - '[''Automatically.'', ''When connected to car Bluetooth and,'', ''Manually.'']'
50
+ - source_sentence: how long before chlamydia symptoms appear?
51
+ sentences:
52
+ - Most people who have chlamydia don't notice any symptoms. If you do get symptoms,
53
+ these usually appear between 1 and 3 weeks after having unprotected sex with an
54
+ infected person. For some people they don't develop until many months later. Sometimes
55
+ the symptoms can disappear after a few days.
56
+ - '[''Open the My Verizon app . ... '', ''Tap the Menu icon. ... '', ''Tap Manage
57
+ device for the appropriate mobile number. ... '', ''Tap Transfer content between
58
+ phones. ... '', ''Tap Start Transfer.'']'
59
+ - 'Psychiatrist vs Psychologist A psychiatrist is classed as a medical doctor, they
60
+ include a physical examination of symptoms in their assessment and are able to
61
+ prescribe medicine: a psychologist is also a doctor by virtue of their PHD level
62
+ qualification, but is not medically trained and cannot prescribe.'
63
+ - source_sentence: are you human korean novela?
64
+ sentences:
65
+ - Many cysts heal on their own, which means that conservative treatments like rest
66
+ and anti-inflammatory painkillers can often be enough to get rid of them. However,
67
+ in some cases, routine drainage of the sac may be necessary to reduce symptoms.
68
+ - A relative of European pear varieties like Bartlett and Anjou, the Asian pear
69
+ is great used in recipes or simply eaten out of hand. It retains a crispness that
70
+ works well in slaws and salads, and it holds its shape better than European pears
71
+ when baked and cooked.
72
+ - 'Are You Human? (Korean: 너도 인간이니; RR: Neodo Inganini; lit. Are You Human Too?)
73
+ is a 2018 South Korean television series starring Seo Kang-jun and Gong Seung-yeon.
74
+ It aired on KBS2''s Mondays and Tuesdays at 22:00 (KST) time slot, from June 4
75
+ to August 7, 2018.'
76
+ datasets:
77
+ - sentence-transformers/gooaq
78
+ pipeline_tag: feature-extraction
79
+ library_name: sentence-transformers
80
+ metrics:
81
+ - dot_accuracy@1
82
+ - dot_accuracy@3
83
+ - dot_accuracy@5
84
+ - dot_accuracy@10
85
+ - dot_precision@1
86
+ - dot_precision@3
87
+ - dot_precision@5
88
+ - dot_precision@10
89
+ - dot_recall@1
90
+ - dot_recall@3
91
+ - dot_recall@5
92
+ - dot_recall@10
93
+ - dot_ndcg@10
94
+ - dot_mrr@10
95
+ - dot_map@100
96
+ - row_non_zero_mean_query
97
+ - row_sparsity_mean_query
98
+ - row_non_zero_mean_corpus
99
+ - row_sparsity_mean_corpus
100
+ co2_eq_emissions:
101
+ emissions: 467.36155743833086
102
+ energy_consumed: 1.2023646840981803
103
+ source: codecarbon
104
+ training_type: fine-tuning
105
+ on_cloud: false
106
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
107
+ ram_total_size: 31.777088165283203
108
+ hours_used: 3.125
109
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
110
+ model-index:
111
+ - name: Sparse CSR model trained on Natural Questions
112
+ results:
113
+ - task:
114
+ type: sparse-information-retrieval
115
+ name: Sparse Information Retrieval
116
+ dataset:
117
+ name: NanoMSMARCO 128
118
+ type: NanoMSMARCO_128
119
+ metrics:
120
+ - type: dot_accuracy@1
121
+ value: 0.42
122
+ name: Dot Accuracy@1
123
+ - type: dot_accuracy@3
124
+ value: 0.64
125
+ name: Dot Accuracy@3
126
+ - type: dot_accuracy@5
127
+ value: 0.68
128
+ name: Dot Accuracy@5
129
+ - type: dot_accuracy@10
130
+ value: 0.8
131
+ name: Dot Accuracy@10
132
+ - type: dot_precision@1
133
+ value: 0.42
134
+ name: Dot Precision@1
135
+ - type: dot_precision@3
136
+ value: 0.21333333333333332
137
+ name: Dot Precision@3
138
+ - type: dot_precision@5
139
+ value: 0.136
140
+ name: Dot Precision@5
141
+ - type: dot_precision@10
142
+ value: 0.08
143
+ name: Dot Precision@10
144
+ - type: dot_recall@1
145
+ value: 0.42
146
+ name: Dot Recall@1
147
+ - type: dot_recall@3
148
+ value: 0.64
149
+ name: Dot Recall@3
150
+ - type: dot_recall@5
151
+ value: 0.68
152
+ name: Dot Recall@5
153
+ - type: dot_recall@10
154
+ value: 0.8
155
+ name: Dot Recall@10
156
+ - type: dot_ndcg@10
157
+ value: 0.6079185617079585
158
+ name: Dot Ndcg@10
159
+ - type: dot_mrr@10
160
+ value: 0.5469047619047619
161
+ name: Dot Mrr@10
162
+ - type: dot_map@100
163
+ value: 0.5546949863343481
164
+ name: Dot Map@100
165
+ - type: row_non_zero_mean_query
166
+ value: 128.0
167
+ name: Row Non Zero Mean Query
168
+ - type: row_sparsity_mean_query
169
+ value: 0.96875
170
+ name: Row Sparsity Mean Query
171
+ - type: row_non_zero_mean_corpus
172
+ value: 128.0
173
+ name: Row Non Zero Mean Corpus
174
+ - type: row_sparsity_mean_corpus
175
+ value: 0.96875
176
+ name: Row Sparsity Mean Corpus
177
+ - task:
178
+ type: sparse-information-retrieval
179
+ name: Sparse Information Retrieval
180
+ dataset:
181
+ name: NanoNFCorpus 128
182
+ type: NanoNFCorpus_128
183
+ metrics:
184
+ - type: dot_accuracy@1
185
+ value: 0.28
186
+ name: Dot Accuracy@1
187
+ - type: dot_accuracy@3
188
+ value: 0.46
189
+ name: Dot Accuracy@3
190
+ - type: dot_accuracy@5
191
+ value: 0.58
192
+ name: Dot Accuracy@5
193
+ - type: dot_accuracy@10
194
+ value: 0.66
195
+ name: Dot Accuracy@10
196
+ - type: dot_precision@1
197
+ value: 0.28
198
+ name: Dot Precision@1
199
+ - type: dot_precision@3
200
+ value: 0.2866666666666667
201
+ name: Dot Precision@3
202
+ - type: dot_precision@5
203
+ value: 0.28
204
+ name: Dot Precision@5
205
+ - type: dot_precision@10
206
+ value: 0.24600000000000002
207
+ name: Dot Precision@10
208
+ - type: dot_recall@1
209
+ value: 0.010077778443246685
210
+ name: Dot Recall@1
211
+ - type: dot_recall@3
212
+ value: 0.04965300165842144
213
+ name: Dot Recall@3
214
+ - type: dot_recall@5
215
+ value: 0.07680443441830657
216
+ name: Dot Recall@5
217
+ - type: dot_recall@10
218
+ value: 0.10785346110615711
219
+ name: Dot Recall@10
220
+ - type: dot_ndcg@10
221
+ value: 0.27112973349418856
222
+ name: Dot Ndcg@10
223
+ - type: dot_mrr@10
224
+ value: 0.3951904761904761
225
+ name: Dot Mrr@10
226
+ - type: dot_map@100
227
+ value: 0.10882673834779542
228
+ name: Dot Map@100
229
+ - type: row_non_zero_mean_query
230
+ value: 128.0
231
+ name: Row Non Zero Mean Query
232
+ - type: row_sparsity_mean_query
233
+ value: 0.96875
234
+ name: Row Sparsity Mean Query
235
+ - type: row_non_zero_mean_corpus
236
+ value: 128.0
237
+ name: Row Non Zero Mean Corpus
238
+ - type: row_sparsity_mean_corpus
239
+ value: 0.96875
240
+ name: Row Sparsity Mean Corpus
241
+ - task:
242
+ type: sparse-information-retrieval
243
+ name: Sparse Information Retrieval
244
+ dataset:
245
+ name: NanoNQ 128
246
+ type: NanoNQ_128
247
+ metrics:
248
+ - type: dot_accuracy@1
249
+ value: 0.46
250
+ name: Dot Accuracy@1
251
+ - type: dot_accuracy@3
252
+ value: 0.62
253
+ name: Dot Accuracy@3
254
+ - type: dot_accuracy@5
255
+ value: 0.7
256
+ name: Dot Accuracy@5
257
+ - type: dot_accuracy@10
258
+ value: 0.82
259
+ name: Dot Accuracy@10
260
+ - type: dot_precision@1
261
+ value: 0.46
262
+ name: Dot Precision@1
263
+ - type: dot_precision@3
264
+ value: 0.20666666666666667
265
+ name: Dot Precision@3
266
+ - type: dot_precision@5
267
+ value: 0.14
268
+ name: Dot Precision@5
269
+ - type: dot_precision@10
270
+ value: 0.08199999999999999
271
+ name: Dot Precision@10
272
+ - type: dot_recall@1
273
+ value: 0.44
274
+ name: Dot Recall@1
275
+ - type: dot_recall@3
276
+ value: 0.58
277
+ name: Dot Recall@3
278
+ - type: dot_recall@5
279
+ value: 0.65
280
+ name: Dot Recall@5
281
+ - type: dot_recall@10
282
+ value: 0.76
283
+ name: Dot Recall@10
284
+ - type: dot_ndcg@10
285
+ value: 0.5976862103963738
286
+ name: Dot Ndcg@10
287
+ - type: dot_mrr@10
288
+ value: 0.5692222222222223
289
+ name: Dot Mrr@10
290
+ - type: dot_map@100
291
+ value: 0.5513454286143362
292
+ name: Dot Map@100
293
+ - type: row_non_zero_mean_query
294
+ value: 128.0
295
+ name: Row Non Zero Mean Query
296
+ - type: row_sparsity_mean_query
297
+ value: 0.96875
298
+ name: Row Sparsity Mean Query
299
+ - type: row_non_zero_mean_corpus
300
+ value: 128.0
301
+ name: Row Non Zero Mean Corpus
302
+ - type: row_sparsity_mean_corpus
303
+ value: 0.96875
304
+ name: Row Sparsity Mean Corpus
305
+ - task:
306
+ type: sparse-nano-beir
307
+ name: Sparse Nano BEIR
308
+ dataset:
309
+ name: NanoBEIR mean 128
310
+ type: NanoBEIR_mean_128
311
+ metrics:
312
+ - type: dot_accuracy@1
313
+ value: 0.38666666666666666
314
+ name: Dot Accuracy@1
315
+ - type: dot_accuracy@3
316
+ value: 0.5733333333333334
317
+ name: Dot Accuracy@3
318
+ - type: dot_accuracy@5
319
+ value: 0.6533333333333333
320
+ name: Dot Accuracy@5
321
+ - type: dot_accuracy@10
322
+ value: 0.7599999999999999
323
+ name: Dot Accuracy@10
324
+ - type: dot_precision@1
325
+ value: 0.38666666666666666
326
+ name: Dot Precision@1
327
+ - type: dot_precision@3
328
+ value: 0.23555555555555555
329
+ name: Dot Precision@3
330
+ - type: dot_precision@5
331
+ value: 0.18533333333333335
332
+ name: Dot Precision@5
333
+ - type: dot_precision@10
334
+ value: 0.136
335
+ name: Dot Precision@10
336
+ - type: dot_recall@1
337
+ value: 0.2900259261477489
338
+ name: Dot Recall@1
339
+ - type: dot_recall@3
340
+ value: 0.4232176672194738
341
+ name: Dot Recall@3
342
+ - type: dot_recall@5
343
+ value: 0.4689348114727689
344
+ name: Dot Recall@5
345
+ - type: dot_recall@10
346
+ value: 0.5559511537020524
347
+ name: Dot Recall@10
348
+ - type: dot_ndcg@10
349
+ value: 0.49224483519950696
350
+ name: Dot Ndcg@10
351
+ - type: dot_mrr@10
352
+ value: 0.5037724867724868
353
+ name: Dot Mrr@10
354
+ - type: dot_map@100
355
+ value: 0.4049557177654933
356
+ name: Dot Map@100
357
+ - type: row_non_zero_mean_query
358
+ value: 128.0
359
+ name: Row Non Zero Mean Query
360
+ - type: row_sparsity_mean_query
361
+ value: 0.96875
362
+ name: Row Sparsity Mean Query
363
+ - type: row_non_zero_mean_corpus
364
+ value: 128.0
365
+ name: Row Non Zero Mean Corpus
366
+ - type: row_sparsity_mean_corpus
367
+ value: 0.96875
368
+ name: Row Sparsity Mean Corpus
369
+ - task:
370
+ type: sparse-information-retrieval
371
+ name: Sparse Information Retrieval
372
+ dataset:
373
+ name: NanoMSMARCO 256
374
+ type: NanoMSMARCO_256
375
+ metrics:
376
+ - type: dot_accuracy@1
377
+ value: 0.42
378
+ name: Dot Accuracy@1
379
+ - type: dot_accuracy@3
380
+ value: 0.7
381
+ name: Dot Accuracy@3
382
+ - type: dot_accuracy@5
383
+ value: 0.76
384
+ name: Dot Accuracy@5
385
+ - type: dot_accuracy@10
386
+ value: 0.84
387
+ name: Dot Accuracy@10
388
+ - type: dot_precision@1
389
+ value: 0.42
390
+ name: Dot Precision@1
391
+ - type: dot_precision@3
392
+ value: 0.2333333333333333
393
+ name: Dot Precision@3
394
+ - type: dot_precision@5
395
+ value: 0.15200000000000002
396
+ name: Dot Precision@5
397
+ - type: dot_precision@10
398
+ value: 0.08399999999999999
399
+ name: Dot Precision@10
400
+ - type: dot_recall@1
401
+ value: 0.42
402
+ name: Dot Recall@1
403
+ - type: dot_recall@3
404
+ value: 0.7
405
+ name: Dot Recall@3
406
+ - type: dot_recall@5
407
+ value: 0.76
408
+ name: Dot Recall@5
409
+ - type: dot_recall@10
410
+ value: 0.84
411
+ name: Dot Recall@10
412
+ - type: dot_ndcg@10
413
+ value: 0.6326016391887893
414
+ name: Dot Ndcg@10
415
+ - type: dot_mrr@10
416
+ value: 0.566111111111111
417
+ name: Dot Mrr@10
418
+ - type: dot_map@100
419
+ value: 0.5727341193854673
420
+ name: Dot Map@100
421
+ - type: row_non_zero_mean_query
422
+ value: 256.0
423
+ name: Row Non Zero Mean Query
424
+ - type: row_sparsity_mean_query
425
+ value: 0.9375
426
+ name: Row Sparsity Mean Query
427
+ - type: row_non_zero_mean_corpus
428
+ value: 256.0
429
+ name: Row Non Zero Mean Corpus
430
+ - type: row_sparsity_mean_corpus
431
+ value: 0.9375
432
+ name: Row Sparsity Mean Corpus
433
+ - task:
434
+ type: sparse-information-retrieval
435
+ name: Sparse Information Retrieval
436
+ dataset:
437
+ name: NanoNFCorpus 256
438
+ type: NanoNFCorpus_256
439
+ metrics:
440
+ - type: dot_accuracy@1
441
+ value: 0.32
442
+ name: Dot Accuracy@1
443
+ - type: dot_accuracy@3
444
+ value: 0.56
445
+ name: Dot Accuracy@3
446
+ - type: dot_accuracy@5
447
+ value: 0.62
448
+ name: Dot Accuracy@5
449
+ - type: dot_accuracy@10
450
+ value: 0.7
451
+ name: Dot Accuracy@10
452
+ - type: dot_precision@1
453
+ value: 0.32
454
+ name: Dot Precision@1
455
+ - type: dot_precision@3
456
+ value: 0.31999999999999995
457
+ name: Dot Precision@3
458
+ - type: dot_precision@5
459
+ value: 0.316
460
+ name: Dot Precision@5
461
+ - type: dot_precision@10
462
+ value: 0.262
463
+ name: Dot Precision@10
464
+ - type: dot_recall@1
465
+ value: 0.030392237560226815
466
+ name: Dot Recall@1
467
+ - type: dot_recall@3
468
+ value: 0.0717373009745601
469
+ name: Dot Recall@3
470
+ - type: dot_recall@5
471
+ value: 0.09312218308574575
472
+ name: Dot Recall@5
473
+ - type: dot_recall@10
474
+ value: 0.133341363492939
475
+ name: Dot Recall@10
476
+ - type: dot_ndcg@10
477
+ value: 0.30709320262394824
478
+ name: Dot Ndcg@10
479
+ - type: dot_mrr@10
480
+ value: 0.45252380952380944
481
+ name: Dot Mrr@10
482
+ - type: dot_map@100
483
+ value: 0.14302697817666413
484
+ name: Dot Map@100
485
+ - type: row_non_zero_mean_query
486
+ value: 256.0
487
+ name: Row Non Zero Mean Query
488
+ - type: row_sparsity_mean_query
489
+ value: 0.9375
490
+ name: Row Sparsity Mean Query
491
+ - type: row_non_zero_mean_corpus
492
+ value: 256.0
493
+ name: Row Non Zero Mean Corpus
494
+ - type: row_sparsity_mean_corpus
495
+ value: 0.9375
496
+ name: Row Sparsity Mean Corpus
497
+ - task:
498
+ type: sparse-information-retrieval
499
+ name: Sparse Information Retrieval
500
+ dataset:
501
+ name: NanoNQ 256
502
+ type: NanoNQ_256
503
+ metrics:
504
+ - type: dot_accuracy@1
505
+ value: 0.42
506
+ name: Dot Accuracy@1
507
+ - type: dot_accuracy@3
508
+ value: 0.64
509
+ name: Dot Accuracy@3
510
+ - type: dot_accuracy@5
511
+ value: 0.68
512
+ name: Dot Accuracy@5
513
+ - type: dot_accuracy@10
514
+ value: 0.84
515
+ name: Dot Accuracy@10
516
+ - type: dot_precision@1
517
+ value: 0.42
518
+ name: Dot Precision@1
519
+ - type: dot_precision@3
520
+ value: 0.22
521
+ name: Dot Precision@3
522
+ - type: dot_precision@5
523
+ value: 0.14
524
+ name: Dot Precision@5
525
+ - type: dot_precision@10
526
+ value: 0.088
527
+ name: Dot Precision@10
528
+ - type: dot_recall@1
529
+ value: 0.4
530
+ name: Dot Recall@1
531
+ - type: dot_recall@3
532
+ value: 0.6
533
+ name: Dot Recall@3
534
+ - type: dot_recall@5
535
+ value: 0.63
536
+ name: Dot Recall@5
537
+ - type: dot_recall@10
538
+ value: 0.79
539
+ name: Dot Recall@10
540
+ - type: dot_ndcg@10
541
+ value: 0.594269599796927
542
+ name: Dot Ndcg@10
543
+ - type: dot_mrr@10
544
+ value: 0.5505952380952379
545
+ name: Dot Mrr@10
546
+ - type: dot_map@100
547
+ value: 0.5330295920949546
548
+ name: Dot Map@100
549
+ - type: row_non_zero_mean_query
550
+ value: 256.0
551
+ name: Row Non Zero Mean Query
552
+ - type: row_sparsity_mean_query
553
+ value: 0.9375
554
+ name: Row Sparsity Mean Query
555
+ - type: row_non_zero_mean_corpus
556
+ value: 256.0
557
+ name: Row Non Zero Mean Corpus
558
+ - type: row_sparsity_mean_corpus
559
+ value: 0.9375
560
+ name: Row Sparsity Mean Corpus
561
+ - task:
562
+ type: sparse-nano-beir
563
+ name: Sparse Nano BEIR
564
+ dataset:
565
+ name: NanoBEIR mean 256
566
+ type: NanoBEIR_mean_256
567
+ metrics:
568
+ - type: dot_accuracy@1
569
+ value: 0.38666666666666666
570
+ name: Dot Accuracy@1
571
+ - type: dot_accuracy@3
572
+ value: 0.6333333333333333
573
+ name: Dot Accuracy@3
574
+ - type: dot_accuracy@5
575
+ value: 0.6866666666666666
576
+ name: Dot Accuracy@5
577
+ - type: dot_accuracy@10
578
+ value: 0.7933333333333333
579
+ name: Dot Accuracy@10
580
+ - type: dot_precision@1
581
+ value: 0.38666666666666666
582
+ name: Dot Precision@1
583
+ - type: dot_precision@3
584
+ value: 0.2577777777777777
585
+ name: Dot Precision@3
586
+ - type: dot_precision@5
587
+ value: 0.2026666666666667
588
+ name: Dot Precision@5
589
+ - type: dot_precision@10
590
+ value: 0.14466666666666664
591
+ name: Dot Precision@10
592
+ - type: dot_recall@1
593
+ value: 0.28346407918674227
594
+ name: Dot Recall@1
595
+ - type: dot_recall@3
596
+ value: 0.45724576699152
597
+ name: Dot Recall@3
598
+ - type: dot_recall@5
599
+ value: 0.4943740610285819
600
+ name: Dot Recall@5
601
+ - type: dot_recall@10
602
+ value: 0.5877804544976463
603
+ name: Dot Recall@10
604
+ - type: dot_ndcg@10
605
+ value: 0.5113214805365548
606
+ name: Dot Ndcg@10
607
+ - type: dot_mrr@10
608
+ value: 0.5230767195767194
609
+ name: Dot Mrr@10
610
+ - type: dot_map@100
611
+ value: 0.41626356321902863
612
+ name: Dot Map@100
613
+ - type: row_non_zero_mean_query
614
+ value: 256.0
615
+ name: Row Non Zero Mean Query
616
+ - type: row_sparsity_mean_query
617
+ value: 0.9375
618
+ name: Row Sparsity Mean Query
619
+ - type: row_non_zero_mean_corpus
620
+ value: 256.0
621
+ name: Row Non Zero Mean Corpus
622
+ - type: row_sparsity_mean_corpus
623
+ value: 0.9375
624
+ name: Row Sparsity Mean Corpus
625
+ ---
626
+
627
+ # Sparse CSR model trained on Natural Questions
628
+
629
+ This is a [CSR Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) on the [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 4096-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
630
+
631
+ ## Model Details
632
+
633
+ ### Model Description
634
+ - **Model Type:** CSR Sparse Encoder
635
+ - **Base model:** [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) <!-- at revision db9d1fe0f31addb4978201b2bf3e577f3f8900d2 -->
636
+ - **Maximum Sequence Length:** 512 tokens
637
+ - **Output Dimensionality:** 4096 dimensions
638
+ - **Similarity Function:** Dot Product
639
+ - **Training Dataset:**
640
+ - [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq)
641
+ - **Language:** en
642
+ - **License:** apache-2.0
643
+
644
+ ### Model Sources
645
+
646
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
647
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
648
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
649
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
650
+
651
+ ### Full Model Architecture
652
+
653
+ ```
654
+ SparseEncoder(
655
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
656
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
657
+ (2): CSRSparsity({'input_dim': 1024, 'hidden_dim': 4096, 'k': 256, 'k_aux': 512, 'normalize': False, 'dead_threshold': 30})
658
+ )
659
+ ```
660
+
661
+ ## Usage
662
+
663
+ ### Direct Usage (Sentence Transformers)
664
+
665
+ First install the Sentence Transformers library:
666
+
667
+ ```bash
668
+ pip install -U sentence-transformers
669
+ ```
670
+
671
+ Then you can load this model and run inference.
672
+ ```python
673
+ from sentence_transformers import SparseEncoder
674
+
675
+ # Download from the 🤗 Hub
676
+ model = SparseEncoder("tomaarsen/csr-mxbai-embed-large-v1-gooaq-2e-4")
677
+ # Run inference
678
+ sentences = [
679
+ 'are you human korean novela?',
680
+ "Are You Human? (Korean: 너도 인간이니; RR: Neodo Inganini; lit. Are You Human Too?) is a 2018 South Korean television series starring Seo Kang-jun and Gong Seung-yeon. It aired on KBS2's Mondays and Tuesdays at 22:00 (KST) time slot, from June 4 to August 7, 2018.",
681
+ 'A relative of European pear varieties like Bartlett and Anjou, the Asian pear is great used in recipes or simply eaten out of hand. It retains a crispness that works well in slaws and salads, and it holds its shape better than European pears when baked and cooked.',
682
+ ]
683
+ embeddings = model.encode(sentences)
684
+ print(embeddings.shape)
685
+ # (3, 4096)
686
+
687
+ # Get the similarity scores for the embeddings
688
+ similarities = model.similarity(embeddings, embeddings)
689
+ print(similarities.shape)
690
+ # [3, 3]
691
+ ```
692
+
693
+ <!--
694
+ ### Direct Usage (Transformers)
695
+
696
+ <details><summary>Click to see the direct usage in Transformers</summary>
697
+
698
+ </details>
699
+ -->
700
+
701
+ <!--
702
+ ### Downstream Usage (Sentence Transformers)
703
+
704
+ You can finetune this model on your own dataset.
705
+
706
+ <details><summary>Click to expand</summary>
707
+
708
+ </details>
709
+ -->
710
+
711
+ <!--
712
+ ### Out-of-Scope Use
713
+
714
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
715
+ -->
716
+
717
+ ## Evaluation
718
+
719
+ ### Metrics
720
+
721
+ #### Sparse Information Retrieval
722
+
723
+ * Datasets: `NanoMSMARCO_128`, `NanoNFCorpus_128` and `NanoNQ_128`
724
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator) with these parameters:
725
+ ```json
726
+ {
727
+ "max_active_dims": 128
728
+ }
729
+ ```
730
+
731
+ | Metric | NanoMSMARCO_128 | NanoNFCorpus_128 | NanoNQ_128 |
732
+ |:-------------------------|:----------------|:-----------------|:-----------|
733
+ | dot_accuracy@1 | 0.42 | 0.28 | 0.46 |
734
+ | dot_accuracy@3 | 0.64 | 0.46 | 0.62 |
735
+ | dot_accuracy@5 | 0.68 | 0.58 | 0.7 |
736
+ | dot_accuracy@10 | 0.8 | 0.66 | 0.82 |
737
+ | dot_precision@1 | 0.42 | 0.28 | 0.46 |
738
+ | dot_precision@3 | 0.2133 | 0.2867 | 0.2067 |
739
+ | dot_precision@5 | 0.136 | 0.28 | 0.14 |
740
+ | dot_precision@10 | 0.08 | 0.246 | 0.082 |
741
+ | dot_recall@1 | 0.42 | 0.0101 | 0.44 |
742
+ | dot_recall@3 | 0.64 | 0.0497 | 0.58 |
743
+ | dot_recall@5 | 0.68 | 0.0768 | 0.65 |
744
+ | dot_recall@10 | 0.8 | 0.1079 | 0.76 |
745
+ | **dot_ndcg@10** | **0.6079** | **0.2711** | **0.5977** |
746
+ | dot_mrr@10 | 0.5469 | 0.3952 | 0.5692 |
747
+ | dot_map@100 | 0.5547 | 0.1088 | 0.5513 |
748
+ | row_non_zero_mean_query | 128.0 | 128.0 | 128.0 |
749
+ | row_sparsity_mean_query | 0.9688 | 0.9688 | 0.9688 |
750
+ | row_non_zero_mean_corpus | 128.0 | 128.0 | 128.0 |
751
+ | row_sparsity_mean_corpus | 0.9688 | 0.9688 | 0.9688 |
752
+
753
+ #### Sparse Nano BEIR
754
+
755
+ * Dataset: `NanoBEIR_mean_128`
756
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
757
+ ```json
758
+ {
759
+ "dataset_names": [
760
+ "msmarco",
761
+ "nfcorpus",
762
+ "nq"
763
+ ],
764
+ "max_active_dims": 128
765
+ }
766
+ ```
767
+
768
+ | Metric | Value |
769
+ |:-------------------------|:-----------|
770
+ | dot_accuracy@1 | 0.3867 |
771
+ | dot_accuracy@3 | 0.5733 |
772
+ | dot_accuracy@5 | 0.6533 |
773
+ | dot_accuracy@10 | 0.76 |
774
+ | dot_precision@1 | 0.3867 |
775
+ | dot_precision@3 | 0.2356 |
776
+ | dot_precision@5 | 0.1853 |
777
+ | dot_precision@10 | 0.136 |
778
+ | dot_recall@1 | 0.29 |
779
+ | dot_recall@3 | 0.4232 |
780
+ | dot_recall@5 | 0.4689 |
781
+ | dot_recall@10 | 0.556 |
782
+ | **dot_ndcg@10** | **0.4922** |
783
+ | dot_mrr@10 | 0.5038 |
784
+ | dot_map@100 | 0.405 |
785
+ | row_non_zero_mean_query | 128.0 |
786
+ | row_sparsity_mean_query | 0.9688 |
787
+ | row_non_zero_mean_corpus | 128.0 |
788
+ | row_sparsity_mean_corpus | 0.9688 |
789
+
790
+ #### Sparse Information Retrieval
791
+
792
+ * Datasets: `NanoMSMARCO_256`, `NanoNFCorpus_256` and `NanoNQ_256`
793
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator) with these parameters:
794
+ ```json
795
+ {
796
+ "max_active_dims": 256
797
+ }
798
+ ```
799
+
800
+ | Metric | NanoMSMARCO_256 | NanoNFCorpus_256 | NanoNQ_256 |
801
+ |:-------------------------|:----------------|:-----------------|:-----------|
802
+ | dot_accuracy@1 | 0.42 | 0.32 | 0.42 |
803
+ | dot_accuracy@3 | 0.7 | 0.56 | 0.64 |
804
+ | dot_accuracy@5 | 0.76 | 0.62 | 0.68 |
805
+ | dot_accuracy@10 | 0.84 | 0.7 | 0.84 |
806
+ | dot_precision@1 | 0.42 | 0.32 | 0.42 |
807
+ | dot_precision@3 | 0.2333 | 0.32 | 0.22 |
808
+ | dot_precision@5 | 0.152 | 0.316 | 0.14 |
809
+ | dot_precision@10 | 0.084 | 0.262 | 0.088 |
810
+ | dot_recall@1 | 0.42 | 0.0304 | 0.4 |
811
+ | dot_recall@3 | 0.7 | 0.0717 | 0.6 |
812
+ | dot_recall@5 | 0.76 | 0.0931 | 0.63 |
813
+ | dot_recall@10 | 0.84 | 0.1333 | 0.79 |
814
+ | **dot_ndcg@10** | **0.6326** | **0.3071** | **0.5943** |
815
+ | dot_mrr@10 | 0.5661 | 0.4525 | 0.5506 |
816
+ | dot_map@100 | 0.5727 | 0.143 | 0.533 |
817
+ | row_non_zero_mean_query | 256.0 | 256.0 | 256.0 |
818
+ | row_sparsity_mean_query | 0.9375 | 0.9375 | 0.9375 |
819
+ | row_non_zero_mean_corpus | 256.0 | 256.0 | 256.0 |
820
+ | row_sparsity_mean_corpus | 0.9375 | 0.9375 | 0.9375 |
821
+
822
+ #### Sparse Nano BEIR
823
+
824
+ * Dataset: `NanoBEIR_mean_256`
825
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
826
+ ```json
827
+ {
828
+ "dataset_names": [
829
+ "msmarco",
830
+ "nfcorpus",
831
+ "nq"
832
+ ],
833
+ "max_active_dims": 256
834
+ }
835
+ ```
836
+
837
+ | Metric | Value |
838
+ |:-------------------------|:-----------|
839
+ | dot_accuracy@1 | 0.3867 |
840
+ | dot_accuracy@3 | 0.6333 |
841
+ | dot_accuracy@5 | 0.6867 |
842
+ | dot_accuracy@10 | 0.7933 |
843
+ | dot_precision@1 | 0.3867 |
844
+ | dot_precision@3 | 0.2578 |
845
+ | dot_precision@5 | 0.2027 |
846
+ | dot_precision@10 | 0.1447 |
847
+ | dot_recall@1 | 0.2835 |
848
+ | dot_recall@3 | 0.4572 |
849
+ | dot_recall@5 | 0.4944 |
850
+ | dot_recall@10 | 0.5878 |
851
+ | **dot_ndcg@10** | **0.5113** |
852
+ | dot_mrr@10 | 0.5231 |
853
+ | dot_map@100 | 0.4163 |
854
+ | row_non_zero_mean_query | 256.0 |
855
+ | row_sparsity_mean_query | 0.9375 |
856
+ | row_non_zero_mean_corpus | 256.0 |
857
+ | row_sparsity_mean_corpus | 0.9375 |
858
+
859
+ <!--
860
+ ## Bias, Risks and Limitations
861
+
862
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
863
+ -->
864
+
865
+ <!--
866
+ ### Recommendations
867
+
868
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
869
+ -->
870
+
871
+ ## Training Details
872
+
873
+ ### Training Dataset
874
+
875
+ #### gooaq
876
+
877
+ * Dataset: [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
878
+ * Size: 3,011,496 training samples
879
+ * Columns: <code>question</code> and <code>answer</code>
880
+ * Approximate statistics based on the first 1000 samples:
881
+ | | question | answer |
882
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
883
+ | type | string | string |
884
+ | details | <ul><li>min: 8 tokens</li><li>mean: 11.87 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 60.09 tokens</li><li>max: 201 tokens</li></ul> |
885
+ * Samples:
886
+ | question | answer |
887
+ |:-----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
888
+ | <code>what is the difference between clay and mud mask?</code> | <code>The main difference between the two is that mud is a skin-healing agent, while clay is a cosmetic, drying agent. Clay masks are most useful for someone who has oily skin and is prone to breakouts of acne and blemishes.</code> |
889
+ | <code>myki how much on card?</code> | <code>A full fare myki card costs $6 and a concession, seniors or child myki costs $3. For more information about how to use your myki, visit ptv.vic.gov.au or call 1800 800 007.</code> |
890
+ | <code>how to find out if someone blocked your phone number on iphone?</code> | <code>If you get a notification like "Message Not Delivered" or you get no notification at all, that's a sign of a potential block. Next, you could try calling the person. If the call goes right to voicemail or rings once (or a half ring) then goes to voicemail, that's further evidence you may have been blocked.</code> |
891
+ * Loss: [<code>CSRLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#csrloss) with these parameters:
892
+ ```json
893
+ {
894
+ "beta": 0.1,
895
+ "gamma": 1.0,
896
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')"
897
+ }
898
+ ```
899
+
900
+ ### Evaluation Dataset
901
+
902
+ #### gooaq
903
+
904
+ * Dataset: [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
905
+ * Size: 1,000 evaluation samples
906
+ * Columns: <code>question</code> and <code>answer</code>
907
+ * Approximate statistics based on the first 1000 samples:
908
+ | | question | answer |
909
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
910
+ | type | string | string |
911
+ | details | <ul><li>min: 8 tokens</li><li>mean: 11.88 tokens</li><li>max: 22 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 61.03 tokens</li><li>max: 127 tokens</li></ul> |
912
+ * Samples:
913
+ | question | answer |
914
+ |:-----------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
915
+ | <code>how do i program my directv remote with my tv?</code> | <code>['Press MENU on your remote.', 'Select Settings & Help > Settings > Remote Control > Program Remote.', 'Choose the device (TV, audio, DVD) you wish to program. ... ', 'Follow the on-screen prompts to complete programming.']</code> |
916
+ | <code>are rodrigues fruit bats nocturnal?</code> | <code>Before its numbers were threatened by habitat destruction, storms, and hunting, some of those groups could number 500 or more members. Sunrise, sunset. Rodrigues fruit bats are most active at dawn, at dusk, and at night.</code> |
917
+ | <code>why does your heart rate increase during exercise bbc bitesize?</code> | <code>During exercise there is an increase in physical activity and muscle cells respire more than they do when the body is at rest. The heart rate increases during exercise. The rate and depth of breathing increases - this makes sure that more oxygen is absorbed into the blood, and more carbon dioxide is removed from it.</code> |
918
+ * Loss: [<code>CSRLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#csrloss) with these parameters:
919
+ ```json
920
+ {
921
+ "beta": 0.1,
922
+ "gamma": 1.0,
923
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')"
924
+ }
925
+ ```
926
+
927
+ ### Training Hyperparameters
928
+ #### Non-Default Hyperparameters
929
+
930
+ - `eval_strategy`: steps
931
+ - `per_device_train_batch_size`: 64
932
+ - `per_device_eval_batch_size`: 64
933
+ - `learning_rate`: 0.0002
934
+ - `num_train_epochs`: 1
935
+ - `warmup_ratio`: 0.1
936
+ - `bf16`: True
937
+ - `load_best_model_at_end`: True
938
+ - `batch_sampler`: no_duplicates
939
+
940
+ #### All Hyperparameters
941
+ <details><summary>Click to expand</summary>
942
+
943
+ - `overwrite_output_dir`: False
944
+ - `do_predict`: False
945
+ - `eval_strategy`: steps
946
+ - `prediction_loss_only`: True
947
+ - `per_device_train_batch_size`: 64
948
+ - `per_device_eval_batch_size`: 64
949
+ - `per_gpu_train_batch_size`: None
950
+ - `per_gpu_eval_batch_size`: None
951
+ - `gradient_accumulation_steps`: 1
952
+ - `eval_accumulation_steps`: None
953
+ - `torch_empty_cache_steps`: None
954
+ - `learning_rate`: 0.0002
955
+ - `weight_decay`: 0.0
956
+ - `adam_beta1`: 0.9
957
+ - `adam_beta2`: 0.999
958
+ - `adam_epsilon`: 1e-08
959
+ - `max_grad_norm`: 1.0
960
+ - `num_train_epochs`: 1
961
+ - `max_steps`: -1
962
+ - `lr_scheduler_type`: linear
963
+ - `lr_scheduler_kwargs`: {}
964
+ - `warmup_ratio`: 0.1
965
+ - `warmup_steps`: 0
966
+ - `log_level`: passive
967
+ - `log_level_replica`: warning
968
+ - `log_on_each_node`: True
969
+ - `logging_nan_inf_filter`: True
970
+ - `save_safetensors`: True
971
+ - `save_on_each_node`: False
972
+ - `save_only_model`: False
973
+ - `restore_callback_states_from_checkpoint`: False
974
+ - `no_cuda`: False
975
+ - `use_cpu`: False
976
+ - `use_mps_device`: False
977
+ - `seed`: 42
978
+ - `data_seed`: None
979
+ - `jit_mode_eval`: False
980
+ - `use_ipex`: False
981
+ - `bf16`: True
982
+ - `fp16`: False
983
+ - `fp16_opt_level`: O1
984
+ - `half_precision_backend`: auto
985
+ - `bf16_full_eval`: False
986
+ - `fp16_full_eval`: False
987
+ - `tf32`: None
988
+ - `local_rank`: 0
989
+ - `ddp_backend`: None
990
+ - `tpu_num_cores`: None
991
+ - `tpu_metrics_debug`: False
992
+ - `debug`: []
993
+ - `dataloader_drop_last`: False
994
+ - `dataloader_num_workers`: 0
995
+ - `dataloader_prefetch_factor`: None
996
+ - `past_index`: -1
997
+ - `disable_tqdm`: False
998
+ - `remove_unused_columns`: True
999
+ - `label_names`: None
1000
+ - `load_best_model_at_end`: True
1001
+ - `ignore_data_skip`: False
1002
+ - `fsdp`: []
1003
+ - `fsdp_min_num_params`: 0
1004
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
1005
+ - `fsdp_transformer_layer_cls_to_wrap`: None
1006
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
1007
+ - `deepspeed`: None
1008
+ - `label_smoothing_factor`: 0.0
1009
+ - `optim`: adamw_torch
1010
+ - `optim_args`: None
1011
+ - `adafactor`: False
1012
+ - `group_by_length`: False
1013
+ - `length_column_name`: length
1014
+ - `ddp_find_unused_parameters`: None
1015
+ - `ddp_bucket_cap_mb`: None
1016
+ - `ddp_broadcast_buffers`: False
1017
+ - `dataloader_pin_memory`: True
1018
+ - `dataloader_persistent_workers`: False
1019
+ - `skip_memory_metrics`: True
1020
+ - `use_legacy_prediction_loop`: False
1021
+ - `push_to_hub`: False
1022
+ - `resume_from_checkpoint`: None
1023
+ - `hub_model_id`: None
1024
+ - `hub_strategy`: every_save
1025
+ - `hub_private_repo`: None
1026
+ - `hub_always_push`: False
1027
+ - `gradient_checkpointing`: False
1028
+ - `gradient_checkpointing_kwargs`: None
1029
+ - `include_inputs_for_metrics`: False
1030
+ - `include_for_metrics`: []
1031
+ - `eval_do_concat_batches`: True
1032
+ - `fp16_backend`: auto
1033
+ - `push_to_hub_model_id`: None
1034
+ - `push_to_hub_organization`: None
1035
+ - `mp_parameters`:
1036
+ - `auto_find_batch_size`: False
1037
+ - `full_determinism`: False
1038
+ - `torchdynamo`: None
1039
+ - `ray_scope`: last
1040
+ - `ddp_timeout`: 1800
1041
+ - `torch_compile`: False
1042
+ - `torch_compile_backend`: None
1043
+ - `torch_compile_mode`: None
1044
+ - `dispatch_batches`: None
1045
+ - `split_batches`: None
1046
+ - `include_tokens_per_second`: False
1047
+ - `include_num_input_tokens_seen`: False
1048
+ - `neftune_noise_alpha`: None
1049
+ - `optim_target_modules`: None
1050
+ - `batch_eval_metrics`: False
1051
+ - `eval_on_start`: False
1052
+ - `use_liger_kernel`: False
1053
+ - `eval_use_gather_object`: False
1054
+ - `average_tokens_across_devices`: False
1055
+ - `prompts`: None
1056
+ - `batch_sampler`: no_duplicates
1057
+ - `multi_dataset_batch_sampler`: proportional
1058
+
1059
+ </details>
1060
+
1061
+ ### Training Logs
1062
+ <details><summary>Click to expand</summary>
1063
+
1064
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_128_dot_ndcg@10 | NanoNFCorpus_128_dot_ndcg@10 | NanoNQ_128_dot_ndcg@10 | NanoBEIR_mean_128_dot_ndcg@10 | NanoMSMARCO_256_dot_ndcg@10 | NanoNFCorpus_256_dot_ndcg@10 | NanoNQ_256_dot_ndcg@10 | NanoBEIR_mean_256_dot_ndcg@10 |
1065
+ |:----------:|:---------:|:-------------:|:---------------:|:---------------------------:|:----------------------------:|:----------------------:|:-----------------------------:|:---------------------------:|:----------------------------:|:----------------------:|:-----------------------------:|
1066
+ | -1 | -1 | - | - | 0.6175 | 0.2875 | 0.5432 | 0.4827 | 0.6158 | 0.3234 | 0.5929 | 0.5107 |
1067
+ | 0.0064 | 300 | 0.3621 | - | - | - | - | - | - | - | - | - |
1068
+ | 0.0128 | 600 | 0.3319 | - | - | - | - | - | - | - | - | - |
1069
+ | 0.0191 | 900 | 0.3212 | - | - | - | - | - | - | - | - | - |
1070
+ | 0.0255 | 1200 | 0.3154 | - | - | - | - | - | - | - | - | - |
1071
+ | 0.0319 | 1500 | 0.3129 | - | - | - | - | - | - | - | - | - |
1072
+ | 0.0383 | 1800 | 0.309 | - | - | - | - | - | - | - | - | - |
1073
+ | 0.0446 | 2100 | 0.317 | - | - | - | - | - | - | - | - | - |
1074
+ | 0.0510 | 2400 | 0.2997 | - | - | - | - | - | - | - | - | - |
1075
+ | 0.0574 | 2700 | 0.3409 | - | - | - | - | - | - | - | - | - |
1076
+ | 0.0638 | 3000 | 0.3251 | 0.3136 | 0.6049 | 0.2393 | 0.5583 | 0.4675 | 0.5950 | 0.2559 | 0.5555 | 0.4688 |
1077
+ | 0.0701 | 3300 | 0.3291 | - | - | - | - | - | - | - | - | - |
1078
+ | 0.0765 | 3600 | 0.3366 | - | - | - | - | - | - | - | - | - |
1079
+ | 0.0829 | 3900 | 0.3286 | - | - | - | - | - | - | - | - | - |
1080
+ | 0.0893 | 4200 | 0.3264 | - | - | - | - | - | - | - | - | - |
1081
+ | 0.0956 | 4500 | 0.3413 | - | - | - | - | - | - | - | - | - |
1082
+ | 0.1020 | 4800 | 0.3352 | - | - | - | - | - | - | - | - | - |
1083
+ | 0.1084 | 5100 | 0.3323 | - | - | - | - | - | - | - | - | - |
1084
+ | 0.1148 | 5400 | 0.3308 | - | - | - | - | - | - | - | - | - |
1085
+ | 0.1211 | 5700 | 0.3127 | - | - | - | - | - | - | - | - | - |
1086
+ | 0.1275 | 6000 | 0.3224 | 0.2949 | 0.5445 | 0.2155 | 0.5394 | 0.4331 | 0.5911 | 0.2340 | 0.5365 | 0.4539 |
1087
+ | 0.1339 | 6300 | 0.3216 | - | - | - | - | - | - | - | - | - |
1088
+ | 0.1403 | 6600 | 0.3202 | - | - | - | - | - | - | - | - | - |
1089
+ | 0.1466 | 6900 | 0.3296 | - | - | - | - | - | - | - | - | - |
1090
+ | 0.1530 | 7200 | 0.3171 | - | - | - | - | - | - | - | - | - |
1091
+ | 0.1594 | 7500 | 0.3141 | - | - | - | - | - | - | - | - | - |
1092
+ | 0.1658 | 7800 | 0.3202 | - | - | - | - | - | - | - | - | - |
1093
+ | 0.1721 | 8100 | 0.3088 | - | - | - | - | - | - | - | - | - |
1094
+ | 0.1785 | 8400 | 0.304 | - | - | - | - | - | - | - | - | - |
1095
+ | 0.1849 | 8700 | 0.3105 | - | - | - | - | - | - | - | - | - |
1096
+ | 0.1913 | 9000 | 0.307 | 0.2849 | 0.6038 | 0.2258 | 0.5471 | 0.4589 | 0.6241 | 0.2449 | 0.5498 | 0.4730 |
1097
+ | 0.1976 | 9300 | 0.3043 | - | - | - | - | - | - | - | - | - |
1098
+ | 0.2040 | 9600 | 0.3035 | - | - | - | - | - | - | - | - | - |
1099
+ | 0.2104 | 9900 | 0.3069 | - | - | - | - | - | - | - | - | - |
1100
+ | 0.2168 | 10200 | 0.3174 | - | - | - | - | - | - | - | - | - |
1101
+ | 0.2231 | 10500 | 0.3111 | - | - | - | - | - | - | - | - | - |
1102
+ | 0.2295 | 10800 | 0.295 | - | - | - | - | - | - | - | - | - |
1103
+ | 0.2359 | 11100 | 0.2892 | - | - | - | - | - | - | - | - | - |
1104
+ | 0.2423 | 11400 | 0.3012 | - | - | - | - | - | - | - | - | - |
1105
+ | 0.2486 | 11700 | 0.3061 | - | - | - | - | - | - | - | - | - |
1106
+ | 0.2550 | 12000 | 0.2863 | 0.2631 | 0.6190 | 0.2720 | 0.5379 | 0.4763 | 0.6056 | 0.2898 | 0.5419 | 0.4791 |
1107
+ | 0.2614 | 12300 | 0.3008 | - | - | - | - | - | - | - | - | - |
1108
+ | 0.2678 | 12600 | 0.2849 | - | - | - | - | - | - | - | - | - |
1109
+ | 0.2741 | 12900 | 0.2876 | - | - | - | - | - | - | - | - | - |
1110
+ | 0.2805 | 13200 | 0.2963 | - | - | - | - | - | - | - | - | - |
1111
+ | 0.2869 | 13500 | 0.2926 | - | - | - | - | - | - | - | - | - |
1112
+ | 0.2933 | 13800 | 0.2855 | - | - | - | - | - | - | - | - | - |
1113
+ | 0.2996 | 14100 | 0.2868 | - | - | - | - | - | - | - | - | - |
1114
+ | 0.3060 | 14400 | 0.294 | - | - | - | - | - | - | - | - | - |
1115
+ | 0.3124 | 14700 | 0.3008 | - | - | - | - | - | - | - | - | - |
1116
+ | 0.3188 | 15000 | 0.293 | 0.2745 | 0.5538 | 0.2847 | 0.5422 | 0.4602 | 0.5615 | 0.2976 | 0.5588 | 0.4726 |
1117
+ | 0.3252 | 15300 | 0.2776 | - | - | - | - | - | - | - | - | - |
1118
+ | 0.3315 | 15600 | 0.2906 | - | - | - | - | - | - | - | - | - |
1119
+ | 0.3379 | 15900 | 0.2874 | - | - | - | - | - | - | - | - | - |
1120
+ | 0.3443 | 16200 | 0.2834 | - | - | - | - | - | - | - | - | - |
1121
+ | 0.3507 | 16500 | 0.2718 | - | - | - | - | - | - | - | - | - |
1122
+ | 0.3570 | 16800 | 0.2834 | - | - | - | - | - | - | - | - | - |
1123
+ | 0.3634 | 17100 | 0.2833 | - | - | - | - | - | - | - | - | - |
1124
+ | 0.3698 | 17400 | 0.281 | - | - | - | - | - | - | - | - | - |
1125
+ | 0.3762 | 17700 | 0.2922 | - | - | - | - | - | - | - | - | - |
1126
+ | 0.3825 | 18000 | 0.279 | 0.2623 | 0.5851 | 0.2696 | 0.5097 | 0.4548 | 0.5849 | 0.2776 | 0.5570 | 0.4732 |
1127
+ | 0.3889 | 18300 | 0.2894 | - | - | - | - | - | - | - | - | - |
1128
+ | 0.3953 | 18600 | 0.283 | - | - | - | - | - | - | - | - | - |
1129
+ | 0.4017 | 18900 | 0.2824 | - | - | - | - | - | - | - | - | - |
1130
+ | 0.4080 | 19200 | 0.2758 | - | - | - | - | - | - | - | - | - |
1131
+ | 0.4144 | 19500 | 0.2893 | - | - | - | - | - | - | - | - | - |
1132
+ | 0.4208 | 19800 | 0.278 | - | - | - | - | - | - | - | - | - |
1133
+ | 0.4272 | 20100 | 0.2814 | - | - | - | - | - | - | - | - | - |
1134
+ | 0.4335 | 20400 | 0.278 | - | - | - | - | - | - | - | - | - |
1135
+ | 0.4399 | 20700 | 0.2783 | - | - | - | - | - | - | - | - | - |
1136
+ | 0.4463 | 21000 | 0.2803 | 0.2510 | 0.5880 | 0.2664 | 0.5664 | 0.4736 | 0.6115 | 0.2734 | 0.5465 | 0.4772 |
1137
+ | 0.4527 | 21300 | 0.2668 | - | - | - | - | - | - | - | - | - |
1138
+ | 0.4590 | 21600 | 0.2828 | - | - | - | - | - | - | - | - | - |
1139
+ | 0.4654 | 21900 | 0.2815 | - | - | - | - | - | - | - | - | - |
1140
+ | 0.4718 | 22200 | 0.2778 | - | - | - | - | - | - | - | - | - |
1141
+ | 0.4782 | 22500 | 0.271 | - | - | - | - | - | - | - | - | - |
1142
+ | 0.4845 | 22800 | 0.2696 | - | - | - | - | - | - | - | - | - |
1143
+ | 0.4909 | 23100 | 0.2698 | - | - | - | - | - | - | - | - | - |
1144
+ | 0.4973 | 23400 | 0.2768 | - | - | - | - | - | - | - | - | - |
1145
+ | 0.5037 | 23700 | 0.2626 | - | - | - | - | - | - | - | - | - |
1146
+ | 0.5100 | 24000 | 0.2611 | 0.2414 | 0.6078 | 0.2635 | 0.5668 | 0.4794 | 0.6231 | 0.2942 | 0.5944 | 0.5039 |
1147
+ | 0.5164 | 24300 | 0.2736 | - | - | - | - | - | - | - | - | - |
1148
+ | 0.5228 | 24600 | 0.2695 | - | - | - | - | - | - | - | - | - |
1149
+ | 0.5292 | 24900 | 0.2673 | - | - | - | - | - | - | - | - | - |
1150
+ | 0.5355 | 25200 | 0.2746 | - | - | - | - | - | - | - | - | - |
1151
+ | 0.5419 | 25500 | 0.2681 | - | - | - | - | - | - | - | - | - |
1152
+ | 0.5483 | 25800 | 0.2676 | - | - | - | - | - | - | - | - | - |
1153
+ | 0.5547 | 26100 | 0.2686 | - | - | - | - | - | - | - | - | - |
1154
+ | 0.5610 | 26400 | 0.2652 | - | - | - | - | - | - | - | - | - |
1155
+ | 0.5674 | 26700 | 0.2596 | - | - | - | - | - | - | - | - | - |
1156
+ | 0.5738 | 27000 | 0.2677 | 0.2494 | 0.6018 | 0.2460 | 0.5280 | 0.4586 | 0.6238 | 0.2775 | 0.5673 | 0.4895 |
1157
+ | 0.5802 | 27300 | 0.2621 | - | - | - | - | - | - | - | - | - |
1158
+ | 0.5865 | 27600 | 0.2558 | - | - | - | - | - | - | - | - | - |
1159
+ | 0.5929 | 27900 | 0.251 | - | - | - | - | - | - | - | - | - |
1160
+ | 0.5993 | 28200 | 0.2601 | - | - | - | - | - | - | - | - | - |
1161
+ | 0.6057 | 28500 | 0.2612 | - | - | - | - | - | - | - | - | - |
1162
+ | 0.6120 | 28800 | 0.2695 | - | - | - | - | - | - | - | - | - |
1163
+ | 0.6184 | 29100 | 0.2662 | - | - | - | - | - | - | - | - | - |
1164
+ | 0.6248 | 29400 | 0.2589 | - | - | - | - | - | - | - | - | - |
1165
+ | 0.6312 | 29700 | 0.2602 | - | - | - | - | - | - | - | - | - |
1166
+ | 0.6376 | 30000 | 0.2698 | 0.2507 | 0.5892 | 0.2996 | 0.5386 | 0.4758 | 0.6102 | 0.2941 | 0.5535 | 0.4860 |
1167
+ | 0.6439 | 30300 | 0.2625 | - | - | - | - | - | - | - | - | - |
1168
+ | 0.6503 | 30600 | 0.2598 | - | - | - | - | - | - | - | - | - |
1169
+ | 0.6567 | 30900 | 0.2594 | - | - | - | - | - | - | - | - | - |
1170
+ | 0.6631 | 31200 | 0.2618 | - | - | - | - | - | - | - | - | - |
1171
+ | 0.6694 | 31500 | 0.2556 | - | - | - | - | - | - | - | - | - |
1172
+ | 0.6758 | 31800 | 0.2591 | - | - | - | - | - | - | - | - | - |
1173
+ | 0.6822 | 32100 | 0.2544 | - | - | - | - | - | - | - | - | - |
1174
+ | 0.6886 | 32400 | 0.2589 | - | - | - | - | - | - | - | - | - |
1175
+ | 0.6949 | 32700 | 0.2522 | - | - | - | - | - | - | - | - | - |
1176
+ | 0.7013 | 33000 | 0.2521 | 0.2535 | 0.6053 | 0.2650 | 0.5329 | 0.4677 | 0.6115 | 0.2925 | 0.6057 | 0.5032 |
1177
+ | 0.7077 | 33300 | 0.2576 | - | - | - | - | - | - | - | - | - |
1178
+ | 0.7141 | 33600 | 0.2582 | - | - | - | - | - | - | - | - | - |
1179
+ | 0.7204 | 33900 | 0.2567 | - | - | - | - | - | - | - | - | - |
1180
+ | 0.7268 | 34200 | 0.2577 | - | - | - | - | - | - | - | - | - |
1181
+ | 0.7332 | 34500 | 0.2568 | - | - | - | - | - | - | - | - | - |
1182
+ | 0.7396 | 34800 | 0.254 | - | - | - | - | - | - | - | - | - |
1183
+ | 0.7459 | 35100 | 0.2489 | - | - | - | - | - | - | - | - | - |
1184
+ | 0.7523 | 35400 | 0.2545 | - | - | - | - | - | - | - | - | - |
1185
+ | 0.7587 | 35700 | 0.2476 | - | - | - | - | - | - | - | - | - |
1186
+ | 0.7651 | 36000 | 0.2637 | 0.2397 | 0.6138 | 0.2726 | 0.5627 | 0.4831 | 0.6056 | 0.2889 | 0.5745 | 0.4897 |
1187
+ | 0.7714 | 36300 | 0.2508 | - | - | - | - | - | - | - | - | - |
1188
+ | 0.7778 | 36600 | 0.2569 | - | - | - | - | - | - | - | - | - |
1189
+ | 0.7842 | 36900 | 0.2419 | - | - | - | - | - | - | - | - | - |
1190
+ | 0.7906 | 37200 | 0.2453 | - | - | - | - | - | - | - | - | - |
1191
+ | 0.7969 | 37500 | 0.2456 | - | - | - | - | - | - | - | - | - |
1192
+ | 0.8033 | 37800 | 0.2497 | - | - | - | - | - | - | - | - | - |
1193
+ | 0.8097 | 38100 | 0.2556 | - | - | - | - | - | - | - | - | - |
1194
+ | 0.8161 | 38400 | 0.252 | - | - | - | - | - | - | - | - | - |
1195
+ | 0.8224 | 38700 | 0.2423 | - | - | - | - | - | - | - | - | - |
1196
+ | 0.8288 | 39000 | 0.2545 | 0.2301 | 0.5927 | 0.2895 | 0.5553 | 0.4792 | 0.5979 | 0.2987 | 0.5587 | 0.4851 |
1197
+ | 0.8352 | 39300 | 0.2482 | - | - | - | - | - | - | - | - | - |
1198
+ | 0.8416 | 39600 | 0.2429 | - | - | - | - | - | - | - | - | - |
1199
+ | 0.8479 | 39900 | 0.2463 | - | - | - | - | - | - | - | - | - |
1200
+ | 0.8543 | 40200 | 0.2354 | - | - | - | - | - | - | - | - | - |
1201
+ | 0.8607 | 40500 | 0.2466 | - | - | - | - | - | - | - | - | - |
1202
+ | 0.8671 | 40800 | 0.2484 | - | - | - | - | - | - | - | - | - |
1203
+ | 0.8734 | 41100 | 0.2448 | - | - | - | - | - | - | - | - | - |
1204
+ | 0.8798 | 41400 | 0.2448 | - | - | - | - | - | - | - | - | - |
1205
+ | 0.8862 | 41700 | 0.2515 | - | - | - | - | - | - | - | - | - |
1206
+ | 0.8926 | 42000 | 0.2428 | 0.2392 | 0.6001 | 0.2826 | 0.5857 | 0.4895 | 0.6208 | 0.3019 | 0.6010 | 0.5079 |
1207
+ | 0.8989 | 42300 | 0.2497 | - | - | - | - | - | - | - | - | - |
1208
+ | 0.9053 | 42600 | 0.2415 | - | - | - | - | - | - | - | - | - |
1209
+ | 0.9117 | 42900 | 0.2408 | - | - | - | - | - | - | - | - | - |
1210
+ | 0.9181 | 43200 | 0.242 | - | - | - | - | - | - | - | - | - |
1211
+ | 0.9245 | 43500 | 0.2412 | - | - | - | - | - | - | - | - | - |
1212
+ | 0.9308 | 43800 | 0.2472 | - | - | - | - | - | - | - | - | - |
1213
+ | 0.9372 | 44100 | 0.2408 | - | - | - | - | - | - | - | - | - |
1214
+ | 0.9436 | 44400 | 0.2374 | - | - | - | - | - | - | - | - | - |
1215
+ | 0.9500 | 44700 | 0.2312 | - | - | - | - | - | - | - | - | - |
1216
+ | **0.9563** | **45000** | **0.2412** | **0.2379** | **0.6079** | **0.2711** | **0.5977** | **0.4922** | **0.6326** | **0.3071** | **0.5943** | **0.5113** |
1217
+ | 0.9627 | 45300 | 0.2381 | - | - | - | - | - | - | - | - | - |
1218
+ | 0.9691 | 45600 | 0.2456 | - | - | - | - | - | - | - | - | - |
1219
+ | 0.9755 | 45900 | 0.2418 | - | - | - | - | - | - | - | - | - |
1220
+ | 0.9818 | 46200 | 0.2355 | - | - | - | - | - | - | - | - | - |
1221
+ | 0.9882 | 46500 | 0.2424 | - | - | - | - | - | - | - | - | - |
1222
+ | 0.9946 | 46800 | 0.2389 | - | - | - | - | - | - | - | - | - |
1223
+
1224
+ * The bold row denotes the saved checkpoint.
1225
+ </details>
1226
+
1227
+ ### Environmental Impact
1228
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
1229
+ - **Energy Consumed**: 1.202 kWh
1230
+ - **Carbon Emitted**: 0.467 kg of CO2
1231
+ - **Hours Used**: 3.125 hours
1232
+
1233
+ ### Training Hardware
1234
+ - **On Cloud**: No
1235
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
1236
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
1237
+ - **RAM Size**: 31.78 GB
1238
+
1239
+ ### Framework Versions
1240
+ - Python: 3.11.6
1241
+ - Sentence Transformers: 4.2.0.dev0
1242
+ - Transformers: 4.49.0
1243
+ - PyTorch: 2.6.0+cu124
1244
+ - Accelerate: 1.5.1
1245
+ - Datasets: 2.21.0
1246
+ - Tokenizers: 0.21.1
1247
+
1248
+ ## Citation
1249
+
1250
+ ### BibTeX
1251
+
1252
+ #### Sentence Transformers
1253
+ ```bibtex
1254
+ @inproceedings{reimers-2019-sentence-bert,
1255
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1256
+ author = "Reimers, Nils and Gurevych, Iryna",
1257
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1258
+ month = "11",
1259
+ year = "2019",
1260
+ publisher = "Association for Computational Linguistics",
1261
+ url = "https://arxiv.org/abs/1908.10084",
1262
+ }
1263
+ ```
1264
+
1265
+ #### CSRLoss
1266
+ ```bibtex
1267
+ @misc{wen2025matryoshkarevisitingsparsecoding,
1268
+ title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
1269
+ author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
1270
+ year={2025},
1271
+ eprint={2503.01776},
1272
+ archivePrefix={arXiv},
1273
+ primaryClass={cs.LG},
1274
+ url={https://arxiv.org/abs/2503.01776},
1275
+ }
1276
+ ```
1277
+
1278
+ #### SparseMultipleNegativesRankingLoss
1279
+ ```bibtex
1280
+ @misc{henderson2017efficient,
1281
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1282
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1283
+ year={2017},
1284
+ eprint={1705.00652},
1285
+ archivePrefix={arXiv},
1286
+ primaryClass={cs.CL}
1287
+ }
1288
+ ```
1289
+
1290
+ <!--
1291
+ ## Glossary
1292
+
1293
+ *Clearly define terms in order to be accessible across audiences.*
1294
+ -->
1295
+
1296
+ <!--
1297
+ ## Model Card Authors
1298
+
1299
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1300
+ -->
1301
+
1302
+ <!--
1303
+ ## Model Card Contact
1304
+
1305
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1306
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mixedbread-ai/mxbai-embed-large-v1",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 24,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.49.0",
23
+ "type_vocab_size": 2,
24
+ "use_cache": false,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.2.0.dev0",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: ",
9
+ "passage": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "model_type": "SparseEncoder",
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e86b2a89f7f8933cf7bd90586cdf69d0012140e412818234b234f807e51ee574
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_CSRSparsity",
18
+ "type": "sentence_transformers.sparse_encoder.models.CSRSparsity"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff