Mdean77 commited on
Commit
b86e5b6
·
verified ·
1 Parent(s): 7292e20

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,623 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:400
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: What potential issues can arise from the use of AI systems in determining
13
+ access to financial resources and essential services?
14
+ sentences:
15
+ - the dispatching of emergency first response services, including by police, firefighters
16
+ and medical aid, as well as of emergency healthcare patient triage systems, should
17
+ also be classified as high-risk since they make decisions in very critical situations
18
+ for the life and health of persons and their property.
19
+ - systems do not entail a high risk to legal and natural persons. In addition, AI
20
+ systems used to evaluate the credit score or creditworthiness of natural persons
21
+ should be classified as high-risk AI systems, since they determine those persons’
22
+ access to financial resources or essential services such as housing, electricity,
23
+ and telecommunication services. AI systems used for those purposes may lead to
24
+ discrimination between persons or groups and may perpetuate historical patterns
25
+ of discrimination, such as that based on racial or ethnic origins, gender, disabilities,
26
+ age or sexual orientation, or may create new forms of discriminatory impacts.
27
+ However, AI systems provided for by Union law for the purpose of detecting fraud
28
+ in the offering
29
+ - In accordance with Articles 2 and 2a of Protocol No 22 on the position of Denmark,
30
+ annexed to the TEU and to the TFEU, Denmark is not bound by rules laid down in
31
+ Article 5(1), first subparagraph, point (g), to the extent it applies to the use
32
+ of biometric categorisation systems for activities in the field of police cooperation
33
+ and judicial cooperation in criminal matters, Article 5(1), first subparagraph,
34
+ point (d), to the extent it applies to the use of AI systems covered by that provision,
35
+ Article 5(1), first subparagraph, point (h), (2) to (6) and Article 26(10) of
36
+ this Regulation adopted on the basis of Article 16 TFEU, or subject to their application,
37
+ which relate to the processing of personal data by the Member States when carrying
38
+ - source_sentence: Why is the failure or malfunctioning of safety components in critical
39
+ infrastructure considered a significant risk?
40
+ sentences:
41
+ - As regards the management and operation of critical infrastructure, it is appropriate
42
+ to classify as high-risk the AI systems intended to be used as safety components
43
+ in the management and operation of critical digital infrastructure as listed in
44
+ point (8) of the Annex to Directive (EU) 2022/2557, road traffic and the supply
45
+ of water, gas, heating and electricity, since their failure or malfunctioning
46
+ may put at risk the life and health of persons at large scale and lead to appreciable
47
+ disruptions in the ordinary conduct of social and economic activities. Safety
48
+ components of critical infrastructure, including critical digital infrastructure,
49
+ are systems used to directly protect the physical integrity of critical infrastructure
50
+ or the
51
+ - (54)
52
+ - (42)
53
+ - source_sentence: How does the current Regulation relate to the provisions set out
54
+ in Regulation (EU) 2022/2065?
55
+ sentences:
56
+ - (39)
57
+ - '(11)
58
+
59
+
60
+
61
+ This Regulation should be without prejudice to the provisions regarding the liability
62
+ of providers of intermediary services as set out in Regulation (EU) 2022/2065
63
+ of the European Parliament and of the Council (15).
64
+
65
+
66
+
67
+
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+ (12)'
78
+ - (53)
79
+ - source_sentence: Why is it important to ensure a consistent and high level of protection
80
+ for AI throughout the Union?
81
+ sentences:
82
+ - AI systems can be easily deployed in a large variety of sectors of the economy
83
+ and many parts of society, including across borders, and can easily circulate
84
+ throughout the Union. Certain Member States have already explored the adoption
85
+ of national rules to ensure that AI is trustworthy and safe and is developed and
86
+ used in accordance with fundamental rights obligations. Diverging national rules
87
+ may lead to the fragmentation of the internal market and may decrease legal certainty
88
+ for operators that develop, import or use AI systems. A consistent and high level
89
+ of protection throughout the Union should therefore be ensured in order to achieve
90
+ trustworthy AI, while divergences hampering the free circulation, innovation,
91
+ deployment and the
92
+ - '(5)
93
+
94
+
95
+
96
+ At the same time, depending on the circumstances regarding its specific application,
97
+ use, and level of technological development, AI may generate risks and cause harm
98
+ to public interests and fundamental rights that are protected by Union law. Such
99
+ harm might be material or immaterial, including physical, psychological, societal
100
+ or economic harm.
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+
111
+
112
+
113
+
114
+ (6)'
115
+ - (57)
116
+ - source_sentence: What is the purpose of implementing a risk-based approach for AI
117
+ systems according to the context?
118
+ sentences:
119
+ - use of lethal force and other AI systems in the context of military and defence
120
+ activities. As regards national security purposes, the exclusion is justified
121
+ both by the fact that national security remains the sole responsibility of Member
122
+ States in accordance with Article 4(2) TEU and by the specific nature and operational
123
+ needs of national security activities and specific national rules applicable to
124
+ those activities. Nonetheless, if an AI system developed, placed on the market,
125
+ put into service or used for military, defence or national security purposes is
126
+ used outside those temporarily or permanently for other purposes, for example,
127
+ civilian or humanitarian purposes, law enforcement or public security purposes,
128
+ such a system would fall
129
+ - '(26)
130
+
131
+
132
+
133
+ In order to introduce a proportionate and effective set of binding rules for AI
134
+ systems, a clearly defined risk-based approach should be followed. That approach
135
+ should tailor the type and content of such rules to the intensity and scope of
136
+ the risks that AI systems can generate. It is therefore necessary to prohibit
137
+ certain unacceptable AI practices, to lay down requirements for high-risk AI systems
138
+ and obligations for the relevant operators, and to lay down transparency obligations
139
+ for certain AI systems.
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
153
+ (27)'
154
+ - To mitigate the risks from high-risk AI systems placed on the market or put into
155
+ service and to ensure a high level of trustworthiness, certain mandatory requirements
156
+ should apply to high-risk AI systems, taking into account the intended purpose
157
+ and the context of use of the AI system and according to the risk-management system
158
+ to be established by the provider. The measures adopted by the providers to comply
159
+ with the mandatory requirements of this Regulation should take into account the
160
+ generally acknowledged state of the art on AI, be proportionate and effective
161
+ to meet the objectives of this Regulation. Based on the New Legislative Framework,
162
+ as clarified in Commission notice ‘The “Blue Guide” on the implementation of EU
163
+ product rules
164
+ pipeline_tag: sentence-similarity
165
+ library_name: sentence-transformers
166
+ metrics:
167
+ - cosine_accuracy@1
168
+ - cosine_accuracy@3
169
+ - cosine_accuracy@5
170
+ - cosine_accuracy@10
171
+ - cosine_precision@1
172
+ - cosine_precision@3
173
+ - cosine_precision@5
174
+ - cosine_precision@10
175
+ - cosine_recall@1
176
+ - cosine_recall@3
177
+ - cosine_recall@5
178
+ - cosine_recall@10
179
+ - cosine_ndcg@10
180
+ - cosine_mrr@10
181
+ - cosine_map@100
182
+ model-index:
183
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
184
+ results:
185
+ - task:
186
+ type: information-retrieval
187
+ name: Information Retrieval
188
+ dataset:
189
+ name: Unknown
190
+ type: unknown
191
+ metrics:
192
+ - type: cosine_accuracy@1
193
+ value: 0.8958333333333334
194
+ name: Cosine Accuracy@1
195
+ - type: cosine_accuracy@3
196
+ value: 1.0
197
+ name: Cosine Accuracy@3
198
+ - type: cosine_accuracy@5
199
+ value: 1.0
200
+ name: Cosine Accuracy@5
201
+ - type: cosine_accuracy@10
202
+ value: 1.0
203
+ name: Cosine Accuracy@10
204
+ - type: cosine_precision@1
205
+ value: 0.8958333333333334
206
+ name: Cosine Precision@1
207
+ - type: cosine_precision@3
208
+ value: 0.3333333333333333
209
+ name: Cosine Precision@3
210
+ - type: cosine_precision@5
211
+ value: 0.19999999999999998
212
+ name: Cosine Precision@5
213
+ - type: cosine_precision@10
214
+ value: 0.09999999999999999
215
+ name: Cosine Precision@10
216
+ - type: cosine_recall@1
217
+ value: 0.8958333333333334
218
+ name: Cosine Recall@1
219
+ - type: cosine_recall@3
220
+ value: 1.0
221
+ name: Cosine Recall@3
222
+ - type: cosine_recall@5
223
+ value: 1.0
224
+ name: Cosine Recall@5
225
+ - type: cosine_recall@10
226
+ value: 1.0
227
+ name: Cosine Recall@10
228
+ - type: cosine_ndcg@10
229
+ value: 0.9560997762648827
230
+ name: Cosine Ndcg@10
231
+ - type: cosine_mrr@10
232
+ value: 0.9409722222222222
233
+ name: Cosine Mrr@10
234
+ - type: cosine_map@100
235
+ value: 0.9409722222222223
236
+ name: Cosine Map@100
237
+ ---
238
+
239
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
240
+
241
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
242
+
243
+ ## Model Details
244
+
245
+ ### Model Description
246
+ - **Model Type:** Sentence Transformer
247
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
248
+ - **Maximum Sequence Length:** 512 tokens
249
+ - **Output Dimensionality:** 1024 dimensions
250
+ - **Similarity Function:** Cosine Similarity
251
+ <!-- - **Training Dataset:** Unknown -->
252
+ <!-- - **Language:** Unknown -->
253
+ <!-- - **License:** Unknown -->
254
+
255
+ ### Model Sources
256
+
257
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
258
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
259
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
260
+
261
+ ### Full Model Architecture
262
+
263
+ ```
264
+ SentenceTransformer(
265
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
266
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
267
+ (2): Normalize()
268
+ )
269
+ ```
270
+
271
+ ## Usage
272
+
273
+ ### Direct Usage (Sentence Transformers)
274
+
275
+ First install the Sentence Transformers library:
276
+
277
+ ```bash
278
+ pip install -U sentence-transformers
279
+ ```
280
+
281
+ Then you can load this model and run inference.
282
+ ```python
283
+ from sentence_transformers import SentenceTransformer
284
+
285
+ # Download from the 🤗 Hub
286
+ model = SentenceTransformer("Mdean77/legal-ft-3")
287
+ # Run inference
288
+ sentences = [
289
+ 'What is the purpose of implementing a risk-based approach for AI systems according to the context?',
290
+ '(26)\n\n\nIn order to introduce a\xa0proportionate and effective set of binding rules for AI systems, a\xa0clearly defined risk-based approach should be followed. That approach should tailor the type and content of such rules to the intensity and scope of the risks that AI systems can generate. It is therefore necessary to prohibit certain unacceptable AI practices, to lay down requirements for high-risk AI systems and obligations for the relevant operators, and to lay down transparency obligations for certain AI systems.\n\n\n\n\n\n\n\n\n\n\n\n\n(27)',
291
+ 'use of lethal force and other AI systems in the context of military and defence activities. As regards national security purposes, the exclusion is justified both by the fact that national security remains the sole responsibility of Member States in accordance with Article\xa04(2) TEU and by the specific nature and operational needs of national security activities and specific national rules applicable to those activities. Nonetheless, if an AI system developed, placed on the market, put into service or used for military, defence or national security purposes is used outside those temporarily or permanently for other purposes, for example, civilian or humanitarian purposes, law enforcement or public security purposes, such a\xa0system would fall',
292
+ ]
293
+ embeddings = model.encode(sentences)
294
+ print(embeddings.shape)
295
+ # [3, 1024]
296
+
297
+ # Get the similarity scores for the embeddings
298
+ similarities = model.similarity(embeddings, embeddings)
299
+ print(similarities.shape)
300
+ # [3, 3]
301
+ ```
302
+
303
+ <!--
304
+ ### Direct Usage (Transformers)
305
+
306
+ <details><summary>Click to see the direct usage in Transformers</summary>
307
+
308
+ </details>
309
+ -->
310
+
311
+ <!--
312
+ ### Downstream Usage (Sentence Transformers)
313
+
314
+ You can finetune this model on your own dataset.
315
+
316
+ <details><summary>Click to expand</summary>
317
+
318
+ </details>
319
+ -->
320
+
321
+ <!--
322
+ ### Out-of-Scope Use
323
+
324
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
325
+ -->
326
+
327
+ ## Evaluation
328
+
329
+ ### Metrics
330
+
331
+ #### Information Retrieval
332
+
333
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
334
+
335
+ | Metric | Value |
336
+ |:--------------------|:-----------|
337
+ | cosine_accuracy@1 | 0.8958 |
338
+ | cosine_accuracy@3 | 1.0 |
339
+ | cosine_accuracy@5 | 1.0 |
340
+ | cosine_accuracy@10 | 1.0 |
341
+ | cosine_precision@1 | 0.8958 |
342
+ | cosine_precision@3 | 0.3333 |
343
+ | cosine_precision@5 | 0.2 |
344
+ | cosine_precision@10 | 0.1 |
345
+ | cosine_recall@1 | 0.8958 |
346
+ | cosine_recall@3 | 1.0 |
347
+ | cosine_recall@5 | 1.0 |
348
+ | cosine_recall@10 | 1.0 |
349
+ | **cosine_ndcg@10** | **0.9561** |
350
+ | cosine_mrr@10 | 0.941 |
351
+ | cosine_map@100 | 0.941 |
352
+
353
+ <!--
354
+ ## Bias, Risks and Limitations
355
+
356
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
357
+ -->
358
+
359
+ <!--
360
+ ### Recommendations
361
+
362
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
363
+ -->
364
+
365
+ ## Training Details
366
+
367
+ ### Training Dataset
368
+
369
+ #### Unnamed Dataset
370
+
371
+ * Size: 400 training samples
372
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
373
+ * Approximate statistics based on the first 400 samples:
374
+ | | sentence_0 | sentence_1 |
375
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
376
+ | type | string | string |
377
+ | details | <ul><li>min: 10 tokens</li><li>mean: 20.33 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 93.01 tokens</li><li>max: 186 tokens</li></ul> |
378
+ * Samples:
379
+ | sentence_0 | sentence_1 |
380
+ |:--------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
381
+ | <code>What is the significance of the number 55 in the given context?</code> | <code>(55)</code> |
382
+ | <code>How does the number 55 relate to the overall theme or subject being discussed?</code> | <code>(55)</code> |
383
+ | <code>What types of practices are prohibited by Union law according to the context?</code> | <code>(45)<br><br><br>Practices that are prohibited by Union law, including data protection law, non-discrimination law, consumer protection law, and competition law, should not be affected by this Regulation.<br><br><br><br><br><br><br><br><br><br><br><br><br>(46)</code> |
384
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
385
+ ```json
386
+ {
387
+ "loss": "MultipleNegativesRankingLoss",
388
+ "matryoshka_dims": [
389
+ 768,
390
+ 512,
391
+ 256,
392
+ 128,
393
+ 64
394
+ ],
395
+ "matryoshka_weights": [
396
+ 1,
397
+ 1,
398
+ 1,
399
+ 1,
400
+ 1
401
+ ],
402
+ "n_dims_per_step": -1
403
+ }
404
+ ```
405
+
406
+ ### Training Hyperparameters
407
+ #### Non-Default Hyperparameters
408
+
409
+ - `eval_strategy`: steps
410
+ - `per_device_train_batch_size`: 10
411
+ - `per_device_eval_batch_size`: 10
412
+ - `num_train_epochs`: 10
413
+ - `multi_dataset_batch_sampler`: round_robin
414
+
415
+ #### All Hyperparameters
416
+ <details><summary>Click to expand</summary>
417
+
418
+ - `overwrite_output_dir`: False
419
+ - `do_predict`: False
420
+ - `eval_strategy`: steps
421
+ - `prediction_loss_only`: True
422
+ - `per_device_train_batch_size`: 10
423
+ - `per_device_eval_batch_size`: 10
424
+ - `per_gpu_train_batch_size`: None
425
+ - `per_gpu_eval_batch_size`: None
426
+ - `gradient_accumulation_steps`: 1
427
+ - `eval_accumulation_steps`: None
428
+ - `torch_empty_cache_steps`: None
429
+ - `learning_rate`: 5e-05
430
+ - `weight_decay`: 0.0
431
+ - `adam_beta1`: 0.9
432
+ - `adam_beta2`: 0.999
433
+ - `adam_epsilon`: 1e-08
434
+ - `max_grad_norm`: 1
435
+ - `num_train_epochs`: 10
436
+ - `max_steps`: -1
437
+ - `lr_scheduler_type`: linear
438
+ - `lr_scheduler_kwargs`: {}
439
+ - `warmup_ratio`: 0.0
440
+ - `warmup_steps`: 0
441
+ - `log_level`: passive
442
+ - `log_level_replica`: warning
443
+ - `log_on_each_node`: True
444
+ - `logging_nan_inf_filter`: True
445
+ - `save_safetensors`: True
446
+ - `save_on_each_node`: False
447
+ - `save_only_model`: False
448
+ - `restore_callback_states_from_checkpoint`: False
449
+ - `no_cuda`: False
450
+ - `use_cpu`: False
451
+ - `use_mps_device`: False
452
+ - `seed`: 42
453
+ - `data_seed`: None
454
+ - `jit_mode_eval`: False
455
+ - `use_ipex`: False
456
+ - `bf16`: False
457
+ - `fp16`: False
458
+ - `fp16_opt_level`: O1
459
+ - `half_precision_backend`: auto
460
+ - `bf16_full_eval`: False
461
+ - `fp16_full_eval`: False
462
+ - `tf32`: None
463
+ - `local_rank`: 0
464
+ - `ddp_backend`: None
465
+ - `tpu_num_cores`: None
466
+ - `tpu_metrics_debug`: False
467
+ - `debug`: []
468
+ - `dataloader_drop_last`: False
469
+ - `dataloader_num_workers`: 0
470
+ - `dataloader_prefetch_factor`: None
471
+ - `past_index`: -1
472
+ - `disable_tqdm`: False
473
+ - `remove_unused_columns`: True
474
+ - `label_names`: None
475
+ - `load_best_model_at_end`: False
476
+ - `ignore_data_skip`: False
477
+ - `fsdp`: []
478
+ - `fsdp_min_num_params`: 0
479
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
480
+ - `fsdp_transformer_layer_cls_to_wrap`: None
481
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
482
+ - `deepspeed`: None
483
+ - `label_smoothing_factor`: 0.0
484
+ - `optim`: adamw_torch
485
+ - `optim_args`: None
486
+ - `adafactor`: False
487
+ - `group_by_length`: False
488
+ - `length_column_name`: length
489
+ - `ddp_find_unused_parameters`: None
490
+ - `ddp_bucket_cap_mb`: None
491
+ - `ddp_broadcast_buffers`: False
492
+ - `dataloader_pin_memory`: True
493
+ - `dataloader_persistent_workers`: False
494
+ - `skip_memory_metrics`: True
495
+ - `use_legacy_prediction_loop`: False
496
+ - `push_to_hub`: False
497
+ - `resume_from_checkpoint`: None
498
+ - `hub_model_id`: None
499
+ - `hub_strategy`: every_save
500
+ - `hub_private_repo`: None
501
+ - `hub_always_push`: False
502
+ - `gradient_checkpointing`: False
503
+ - `gradient_checkpointing_kwargs`: None
504
+ - `include_inputs_for_metrics`: False
505
+ - `include_for_metrics`: []
506
+ - `eval_do_concat_batches`: True
507
+ - `fp16_backend`: auto
508
+ - `push_to_hub_model_id`: None
509
+ - `push_to_hub_organization`: None
510
+ - `mp_parameters`:
511
+ - `auto_find_batch_size`: False
512
+ - `full_determinism`: False
513
+ - `torchdynamo`: None
514
+ - `ray_scope`: last
515
+ - `ddp_timeout`: 1800
516
+ - `torch_compile`: False
517
+ - `torch_compile_backend`: None
518
+ - `torch_compile_mode`: None
519
+ - `dispatch_batches`: None
520
+ - `split_batches`: None
521
+ - `include_tokens_per_second`: False
522
+ - `include_num_input_tokens_seen`: False
523
+ - `neftune_noise_alpha`: None
524
+ - `optim_target_modules`: None
525
+ - `batch_eval_metrics`: False
526
+ - `eval_on_start`: False
527
+ - `use_liger_kernel`: False
528
+ - `eval_use_gather_object`: False
529
+ - `average_tokens_across_devices`: False
530
+ - `prompts`: None
531
+ - `batch_sampler`: batch_sampler
532
+ - `multi_dataset_batch_sampler`: round_robin
533
+
534
+ </details>
535
+
536
+ ### Training Logs
537
+ | Epoch | Step | cosine_ndcg@10 |
538
+ |:-----:|:----:|:--------------:|
539
+ | 1.0 | 40 | 0.9715 |
540
+ | 1.25 | 50 | 0.9638 |
541
+ | 2.0 | 80 | 0.9715 |
542
+ | 2.5 | 100 | 0.9638 |
543
+ | 3.0 | 120 | 0.9742 |
544
+ | 3.75 | 150 | 0.9792 |
545
+ | 4.0 | 160 | 0.9700 |
546
+ | 5.0 | 200 | 0.9715 |
547
+ | 6.0 | 240 | 0.9505 |
548
+ | 6.25 | 250 | 0.9505 |
549
+ | 7.0 | 280 | 0.9623 |
550
+ | 7.5 | 300 | 0.9638 |
551
+ | 8.0 | 320 | 0.9561 |
552
+ | 8.75 | 350 | 0.9638 |
553
+ | 9.0 | 360 | 0.9638 |
554
+ | 10.0 | 400 | 0.9561 |
555
+
556
+
557
+ ### Framework Versions
558
+ - Python: 3.11.11
559
+ - Sentence Transformers: 3.4.1
560
+ - Transformers: 4.48.2
561
+ - PyTorch: 2.5.1+cu124
562
+ - Accelerate: 1.3.0
563
+ - Datasets: 3.2.0
564
+ - Tokenizers: 0.21.0
565
+
566
+ ## Citation
567
+
568
+ ### BibTeX
569
+
570
+ #### Sentence Transformers
571
+ ```bibtex
572
+ @inproceedings{reimers-2019-sentence-bert,
573
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
574
+ author = "Reimers, Nils and Gurevych, Iryna",
575
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
576
+ month = "11",
577
+ year = "2019",
578
+ publisher = "Association for Computational Linguistics",
579
+ url = "https://arxiv.org/abs/1908.10084",
580
+ }
581
+ ```
582
+
583
+ #### MatryoshkaLoss
584
+ ```bibtex
585
+ @misc{kusupati2024matryoshka,
586
+ title={Matryoshka Representation Learning},
587
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
588
+ year={2024},
589
+ eprint={2205.13147},
590
+ archivePrefix={arXiv},
591
+ primaryClass={cs.LG}
592
+ }
593
+ ```
594
+
595
+ #### MultipleNegativesRankingLoss
596
+ ```bibtex
597
+ @misc{henderson2017efficient,
598
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
599
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
600
+ year={2017},
601
+ eprint={1705.00652},
602
+ archivePrefix={arXiv},
603
+ primaryClass={cs.CL}
604
+ }
605
+ ```
606
+
607
+ <!--
608
+ ## Glossary
609
+
610
+ *Clearly define terms in order to be accessible across audiences.*
611
+ -->
612
+
613
+ <!--
614
+ ## Model Card Authors
615
+
616
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
617
+ -->
618
+
619
+ <!--
620
+ ## Model Card Contact
621
+
622
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
623
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.2",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97951b2fa99d83964ba0944f238b105c330934a74e6177199135b00cdd5d9e26
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff