king17pvp commited on
Commit
9a75104
·
verified ·
1 Parent(s): 329f210

Added tinybiobert based biencoder

Browse files
biencoder-checkpoints/checkpoint-tinybiobert/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 312,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
biencoder-checkpoints/checkpoint-tinybiobert/README.md ADDED
@@ -0,0 +1,639 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:332672
8
+ - loss:CachedMultipleNegativesRankingLoss
9
+ base_model: nlpie/tiny-biobert
10
+ widget:
11
+ - source_sentence: mechanism of primordial follicle growth initiation
12
+ sentences:
13
+ - at first sight, rna splicing enables eukaryotes to increase the coding potential
14
+ of their genomes. we shall return to this idea again in this chapter and the next,
15
+ but we first need to describe the cellular machinery that performs this remarkable
16
+ task. o(a) (b) 5˜5˜aoh aho 3˜5˜exon sequence intron sequence 3˜exon 2˜
17
+ - pituitary gonadotropins maintain a normal ovarian reserve by promoting the general
18
+ health of the ovary. however, the rate at which resting primordial follicles enter
19
+ the growth process appears to be independent of pituitary gonadotropins. the decision
20
+ of a resting follicle to enter the early growth phase is primarily dependent on
21
+ intraovarian paracrine factors produced by both the follicle cells and oocytes.
22
+ the gamete in primordial follicles the gamete is derived from oogonia that have
23
+ entered the first meiotic division; such oogonia are referred to as primary oocytes.
24
+ primary oocytes progress through most of prophase of the first meiotic division
25
+ - primary sclerosing cholangitis (psc) is a disorder characterized by both intrahepatic
26
+ and extrahepatic bile duct inflammation and fibrosis, frequently leading to biliary
27
+ cirrhosis and hepatic failure; approximately 5% of patients with uc have psc,
28
+ but 50–75% of patients with psc have ibd. psc occurs less often in patients with
29
+ cd. although it can be recognized after the diagnosis of ibd, psc can be detected
30
+ earlier or even years after proctocolectomy. consistent with this, the immunogenetic
31
+ basis for psc appears to be overlapping but distinct from uc based on gwas, although
32
+ both ibd and psc are commonly panca positive. most patients have no symptoms at
33
+ the time of diagnosis; when symptoms are present, they consist of fatigue, jaundice,
34
+ abdominal pain, fever, anorexia, and malaise. the traditional gold standard diagnostic
35
+ test is endoscopic retrograde cholangiopancreatography (ercp), but magnetic resonance
36
+ cholangiopancreatography (mrcp) is also sensitive and specific. mrcp is
37
+ - source_sentence: naming and numbering system for fatty acid carbons
38
+ sentences:
39
+ - 'and meta-analysis examining the impact of incision on outcomes after abdominal
40
+ surgery. am j surg. 2013;206:400-409. doi: 10.1016/j.amjsurg.2012.11.008bilsel
41
+ y, abci i. the search for ideal hernia repair; mesh materi-als and types. int
42
+ j surg. 2012;10:317-321. doi: 10.1016/j.ijsu.2012.05.002brown sr, tiernan j. transverse
43
+ verses midline incisions for abdom-inal surgery. cochrane database syst rev. 2005;(4):cd005199.
44
+ doi: 10.1002/14651858.cd005199.pub2caro-tarrago a, olona casas c, jimenez salido
45
+ a, et al. prevention of incisional hernia in midline laparotomy with an onlay
46
+ mesh: a randomized clinical trial. world j surg. 2014;38:2223-2230. doi: 10.1007/s00268-014-2510-6conze
47
+ j, kingsnorth an, flament jb, et al. randomized clinical trial comparing lightweight
48
+ composite mesh with polyester or polypropylene mesh for incisional hernia repair.
49
+ br j surg. 2005;92:1488-1493. doi: 10.1002/bjs.5208de vries reilingh ts, van goor
50
+ h, rosman c, et al. “compo-nents separation technique” for the'
51
+ - this entailed a risk of serious injury. a similar syndrome in malaysia and indonesia
52
+ is known as latah and in siberia as miryachit. this syndrome has been framed in
53
+ psychologic terms as conditioned responses (saint-hilaire et al) or as culturally
54
+ determined behavior (simons). possibly some of the complex secondary phenomena
55
+ can be explained in this way, but the stereotyped onset with an uncontrollable
56
+ startle and the familial occurrence attest to a biologic basis. the most common
57
+ mutation is in the 1-subunit of the inhibitory glycine receptor glra1 (shiang
58
+ et al) but other glycine receptor–related genes have been implicated in other
59
+ cases. as pointed out by suhren and associates and by kurczynski, the condition
60
+ is transmitted in some families as an autosomal dominant trait. the subject has
61
+ been reviewed by wilkins and colleagues and by ryan and associates.
62
+ - 'the common names and structures of some fatty acids of physiologic importance
63
+ are listed in figure 16.4. in humans, fatty acids with an even number of carbon
64
+ atoms (16, 18, or 20) predominate, with longer fatty acids (>22 carbons) being
65
+ found in the brain. the carbon atoms are numbered, beginning with the carbonyl
66
+ carbon as carbon 1. the number before the colon indicates the number of carbons
67
+ in the chain, and those after the colon indicate the numbers and positions (relative
68
+ to the carboxyl end) of double bonds. for example, as denoted in figure 16.4,
69
+ arachidonic acid, 20:4(5,8,11,14), is 20 carbons long and has four double bonds
70
+ (between carbons 5–6, 8–9, 11–12, and 14–15). [note: carbon 2, the carbon to which
71
+ the carboxyl group is attached, is also called the α-carbon, carbon 3 is the βcarbon,
72
+ and carbon 4 is the γ-carbon. the carbon of the terminal methyl group is called
73
+ the ω-carbon regardless of the chain length.] the double bonds in a fatty acid
74
+ can also be referenced relative'
75
+ - source_sentence: how does the extent of disease at the start of androgen depletion
76
+ therapy relate to prognosis?
77
+ sentences:
78
+ - (c) localization of the mre11 complex to damaged dna as visualized by antibodies
79
+ against the mre11 subunit (red). mre11 is a nuclease that processes damaged dna
80
+ in preparation for homologous recombination (see figure 5–48). (a), (b), and (c)
81
+ were processed 30 minutes after x-irradiation. (from b.e. nelms et al., science
82
+ 280:590– 592, 1998. with permission from aaas.) figure 5–53 chromosome crossing-over
83
+ occurs in meiosis. meiosis is the process by which a diploid cell gives rise to
84
+ four haploid germ cells, as described in detail in chapter 17. meiosis produces
85
+ germ cells in which the paternal and maternal genetic information (red and blue)
86
+ has been reassorted through chromosome crossovers. in addition, many short regions
87
+ of gene conversion occur, as indicated.
88
+ - endometriosis infertility managment hormonal suppression of endometriosis typically
89
+ has a minimal benefit for endometriosis-related infertility (265). in minimal
90
+ to mild disease, laparoscopic ablation appears to significantly improve pregnancy
91
+ rates when compared to diagnostic laparoscopy alone, although there remains some
92
+ dissent (267,269). one major randomized trial reported 31% versus 17% pregnancy
93
+ rates over 3 years with a subsequent meta-analysis supporting these findings (265,266,270,271).
94
+ although authors have estimated that eight laparoscopies involving treatment of
95
+ mild or minimal endometriosis would need to be performed for each pregnancy gained,
96
+ that number is likely to be much higher given that not everyone who undergoes
97
+ laparoscopy will have endometriosis (267,270). the benefit of surgical management
98
+ of endometriosis is even less clear for moderate to severe disease, although removal
99
+ of endometriomas may be indicated prior to ivf when they would interfere with
100
+ oocyte
101
+ - proportional to disease extent at the time androgen depletion is first started,
102
+ whereas the degree of psa decline at 6 months has been shown to be prognostic.
103
+ in a large-scale trial, psa nadir proved prognostic.
104
+ - source_sentence: side effects of pamidronate and zoledronate infusions
105
+ sentences:
106
+ - to allow an early diagnosis of pancreatic cancer. despite the fact that many tumor
107
+ markers such as ca19-9 have been studied, there are still no effective screening
108
+ tests for pancreatic cancer. research tak-ing advantage of recent advances in
109
+ genomics, gene expression analysis, and proteomics has demonstrated thousands
110
+ of genes and corresponding proteins that are differentially expressed in pancreatic
111
+ tumors that have potential for early detection of pan-creatic cancer.316 some
112
+ of these proteins would be expected to be expressed at the cell surface or in
113
+ pancreatic juice and may become useful as biomarkers for pancreatic cancer in
114
+ the future.in patients presenting with jaundice, a reasonable first diagnostic
115
+ imaging study is abdominal ultrasound. if bile duct dilation is not seen, hepatocellular
116
+ disease is likely. demonstra-tion of cholelithiasis and bile duct dilation suggests
117
+ a diagnosis of choledocholithiasis, and the next logical step would be ercp to
118
+ clear the bile duct. in the
119
+ - in general, the prognosis for regular ovulatory cycles and subsequent normal fertility
120
+ in young women who experience an episode of abnormal bleeding is good, particularly
121
+ for patients who develop abnormal bleeding as a result of anovulation within the
122
+ first years after menarche and in whom there are no signs of other specific conditions.
123
+ some girls, including those in whom there is an underlying medical cause, such
124
+ as pcos, will continue to have abnormal bleeding into middle and late adolescence
125
+ and adulthood and will benefit from the ongoing use of oral contraceptives to
126
+ manage hirsutism, acne, and irregular periods. ovulation induction may ultimately
127
+ be necessary to achieve fertility in these individuals, although teens should
128
+ be advised that they should not assume that they are infertile. individuals with
129
+ coagulopathies may benefit from ongoing oral contraceptive use, use of tranexamic
130
+ acid, or intranasal desmopressin (99).
131
+ - pamidronate, 60–90 mg, infused over 2–4 hours, and zoledronate, 4 mg, infused
132
+ over at least 15 minutes, have been approved for the treatment of hypercalcemia
133
+ of malignancy and have largely replaced the less effective etidronate for this
134
+ indication. the bisphosphonate effects generally persist for weeks, but treatment
135
+ can be repeated after a 7-day interval if necessary and if renal function is not
136
+ impaired. some patients experience a self-limited flu-like syndrome after the
137
+ initial infusion, but subsequent infusions generally do not have this side effect.
138
+ repeated doses of these drugs have been linked to renal deterioration and osteonecrosis
139
+ of the jaw, but this adverse effect is rare.
140
+ - source_sentence: comparison of erythromycin and azithromycin in terms of cost and
141
+ tolerability
142
+ sentences:
143
+ - alternative agents are erythromycin and azithromycin. azithromycin is more expensive
144
+ but offers the advantages of better gastrointestinal tolerability, once-daily
145
+ dosing, and a 5-day treatment course. resistance to erythromycin and other macrolides
146
+ is common among isolates from several countries, including spain, italy, finland,
147
+ japan, and korea. macrolide resistance may be becoming more prevalent elsewhere
148
+ with the increasing use of this class of antibiotics. in areas with resistance
149
+ rates exceeding 5–10%, macrolides should be avoided unless results of susceptibility
150
+ testing are known. follow-up culture after treatment is no longer routinely recommended
151
+ but may be warranted in selected cases, such as those involving patients or families
152
+ with frequent streptococcal infections or those occurring in situations in which
153
+ the risk of arf is thought to be high (e.g., when cases of arf have recently been
154
+ reported in the community).
155
+ - 50 mm/h. the main diagnostic difficulty in diagnosis arises when the emg performed
156
+ early in the course of illness shows conduction block that simulates a demyelinating
157
+ polyneuropathy. nerve biopsy should then settle the issue.
158
+ - idioventricular rhythms three or more ventricular beats at a rate slower than
159
+ 100 beats/min are termed idioventricular rhythm (fig. 277-1c). automaticity is
160
+ the likely mechanism. idioventricular rhythms are common during acute mi (chap.
161
+ 295) and may emerge during sinus bradycardia. atropine may be administered to
162
+ increase the sinus rates if the loss of atrioventricular synchrony leads to hemodynamic
163
+ compromise. this rhythm is also common in patients with cardiomyopathies or sleep
164
+ apnea. it can also be idiopathic, often emerging when the sinus rate slows during
165
+ sleep. therapy should target any underlying cause and correction of bradycardia.
166
+ specific therapy for asymptomatic idioventricular rhythm is not necessary.
167
+ pipeline_tag: sentence-similarity
168
+ library_name: sentence-transformers
169
+ metrics:
170
+ - cosine_accuracy@1
171
+ - cosine_accuracy@3
172
+ - cosine_accuracy@5
173
+ - cosine_accuracy@10
174
+ - cosine_precision@1
175
+ - cosine_precision@3
176
+ - cosine_precision@5
177
+ - cosine_precision@10
178
+ - cosine_recall@1
179
+ - cosine_recall@3
180
+ - cosine_recall@5
181
+ - cosine_recall@10
182
+ - cosine_ndcg@10
183
+ - cosine_mrr@10
184
+ - cosine_map@100
185
+ model-index:
186
+ - name: SentenceTransformer based on nlpie/tiny-biobert
187
+ results:
188
+ - task:
189
+ type: information-retrieval
190
+ name: Information Retrieval
191
+ dataset:
192
+ name: Unknown
193
+ type: unknown
194
+ metrics:
195
+ - type: cosine_accuracy@1
196
+ value: 0.5447024940253692
197
+ name: Cosine Accuracy@1
198
+ - type: cosine_accuracy@3
199
+ value: 0.751761750107237
200
+ name: Cosine Accuracy@3
201
+ - type: cosine_accuracy@5
202
+ value: 0.8228138979104112
203
+ name: Cosine Accuracy@5
204
+ - type: cosine_accuracy@10
205
+ value: 0.893927323978185
206
+ name: Cosine Accuracy@10
207
+ - type: cosine_precision@1
208
+ value: 0.5447024940253692
209
+ name: Cosine Precision@1
210
+ - type: cosine_precision@3
211
+ value: 0.2505872500357456
212
+ name: Cosine Precision@3
213
+ - type: cosine_precision@5
214
+ value: 0.16456277958208224
215
+ name: Cosine Precision@5
216
+ - type: cosine_precision@10
217
+ value: 0.08939273239781849
218
+ name: Cosine Precision@10
219
+ - type: cosine_recall@1
220
+ value: 0.5447024940253692
221
+ name: Cosine Recall@1
222
+ - type: cosine_recall@3
223
+ value: 0.751761750107237
224
+ name: Cosine Recall@3
225
+ - type: cosine_recall@5
226
+ value: 0.8228138979104112
227
+ name: Cosine Recall@5
228
+ - type: cosine_recall@10
229
+ value: 0.893927323978185
230
+ name: Cosine Recall@10
231
+ - type: cosine_ndcg@10
232
+ value: 0.7189132708892791
233
+ name: Cosine Ndcg@10
234
+ - type: cosine_mrr@10
235
+ value: 0.6628477055180666
236
+ name: Cosine Mrr@10
237
+ - type: cosine_map@100
238
+ value: 0.6674861654698347
239
+ name: Cosine Map@100
240
+ ---
241
+
242
+ # SentenceTransformer based on nlpie/tiny-biobert
243
+
244
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nlpie/tiny-biobert](https://huggingface.co/nlpie/tiny-biobert). It maps sentences & paragraphs to a 312-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
245
+
246
+ ## Model Details
247
+
248
+ ### Model Description
249
+ - **Model Type:** Sentence Transformer
250
+ - **Base model:** [nlpie/tiny-biobert](https://huggingface.co/nlpie/tiny-biobert) <!-- at revision a49b9101d3e9af1f646a43cf3524231a0d1404a1 -->
251
+ - **Maximum Sequence Length:** 512 tokens
252
+ - **Output Dimensionality:** 312 dimensions
253
+ - **Similarity Function:** Cosine Similarity
254
+ <!-- - **Training Dataset:** Unknown -->
255
+ <!-- - **Language:** Unknown -->
256
+ <!-- - **License:** Unknown -->
257
+
258
+ ### Model Sources
259
+
260
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
261
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
262
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
263
+
264
+ ### Full Model Architecture
265
+
266
+ ```
267
+ SentenceTransformer(
268
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
269
+ (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
270
+ )
271
+ ```
272
+
273
+ ## Usage
274
+
275
+ ### Direct Usage (Sentence Transformers)
276
+
277
+ First install the Sentence Transformers library:
278
+
279
+ ```bash
280
+ pip install -U sentence-transformers
281
+ ```
282
+
283
+ Then you can load this model and run inference.
284
+ ```python
285
+ from sentence_transformers import SentenceTransformer
286
+
287
+ # Download from the 🤗 Hub
288
+ model = SentenceTransformer("sentence_transformers_model_id")
289
+ # Run inference
290
+ sentences = [
291
+ 'comparison of erythromycin and azithromycin in terms of cost and tolerability',
292
+ 'alternative agents are erythromycin and azithromycin. azithromycin is more expensive but offers the advantages of better gastrointestinal tolerability, once-daily dosing, and a 5-day treatment course. resistance to erythromycin and other macrolides is common among isolates from several countries, including spain, italy, finland, japan, and korea. macrolide resistance may be becoming more prevalent elsewhere with the increasing use of this class of antibiotics. in areas with resistance rates exceeding 5–10%, macrolides should be avoided unless results of susceptibility testing are known. follow-up culture after treatment is no longer routinely recommended but may be warranted in selected cases, such as those involving patients or families with frequent streptococcal infections or those occurring in situations in which the risk of arf is thought to be high (e.g., when cases of arf have recently been reported in the community).',
293
+ '50 mm/h. the main diagnostic difficulty in diagnosis arises when the emg performed early in the course of illness shows conduction block that simulates a demyelinating polyneuropathy. nerve biopsy should then settle the issue.',
294
+ ]
295
+ embeddings = model.encode(sentences)
296
+ print(embeddings.shape)
297
+ # [3, 312]
298
+
299
+ # Get the similarity scores for the embeddings
300
+ similarities = model.similarity(embeddings, embeddings)
301
+ print(similarities.shape)
302
+ # [3, 3]
303
+ ```
304
+
305
+ <!--
306
+ ### Direct Usage (Transformers)
307
+
308
+ <details><summary>Click to see the direct usage in Transformers</summary>
309
+
310
+ </details>
311
+ -->
312
+
313
+ <!--
314
+ ### Downstream Usage (Sentence Transformers)
315
+
316
+ You can finetune this model on your own dataset.
317
+
318
+ <details><summary>Click to expand</summary>
319
+
320
+ </details>
321
+ -->
322
+
323
+ <!--
324
+ ### Out-of-Scope Use
325
+
326
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
327
+ -->
328
+
329
+ ## Evaluation
330
+
331
+ ### Metrics
332
+
333
+ #### Information Retrieval
334
+
335
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
336
+
337
+ | Metric | Value |
338
+ |:--------------------|:-----------|
339
+ | cosine_accuracy@1 | 0.5447 |
340
+ | cosine_accuracy@3 | 0.7518 |
341
+ | cosine_accuracy@5 | 0.8228 |
342
+ | cosine_accuracy@10 | 0.8939 |
343
+ | cosine_precision@1 | 0.5447 |
344
+ | cosine_precision@3 | 0.2506 |
345
+ | cosine_precision@5 | 0.1646 |
346
+ | cosine_precision@10 | 0.0894 |
347
+ | cosine_recall@1 | 0.5447 |
348
+ | cosine_recall@3 | 0.7518 |
349
+ | cosine_recall@5 | 0.8228 |
350
+ | cosine_recall@10 | 0.8939 |
351
+ | **cosine_ndcg@10** | **0.7189** |
352
+ | cosine_mrr@10 | 0.6628 |
353
+ | cosine_map@100 | 0.6675 |
354
+
355
+ <!--
356
+ ## Bias, Risks and Limitations
357
+
358
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
359
+ -->
360
+
361
+ <!--
362
+ ### Recommendations
363
+
364
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
365
+ -->
366
+
367
+ ## Training Details
368
+
369
+ ### Training Dataset
370
+
371
+ #### Unnamed Dataset
372
+
373
+ * Size: 332,672 training samples
374
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
375
+ * Approximate statistics based on the first 1000 samples:
376
+ | | sentence_0 | sentence_1 |
377
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
378
+ | type | string | string |
379
+ | details | <ul><li>min: 6 tokens</li><li>mean: 15.8 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 185.45 tokens</li><li>max: 446 tokens</li></ul> |
380
+ * Samples:
381
+ | sentence_0 | sentence_1 |
382
+ |:--------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
383
+ | <code>watershed area capillaries in spinal cord</code> | <code>the posterior medullary arteries form the paired posterior spinal arteries that supply the dorsal third of the cord by means of direct penetrating vessels and a plexus of pial vessels (similar to that of the ventral cord, with which it anastomoses freely). within the cord substance, then, there is a “watershed” area of capillaries where the penetrating branches of the anterior spinal artery meet the penetrating branches of the posterior spinal arteries and the branches of the circumferential pial network. all spinal segments, because of the variable size of collateral arteries, do not have the same abundance of circulatory protection.</code> |
384
+ | <code>sulfa drugs allergy symptoms and reactions</code> | <code>drugs of abuse (eg, amphetamines, cocaine, drugs of abuse (eg, heroin/opioids) lsd), meperidine sympathomimetics parasympathomimetics (eg, pilocarpine), organophosphates sulfa drugs sulfonamide antibiotics, sulfasalazine, scary sulfa pharm facts probenecid, furosemide, acetazolamide, celecoxib, thiazides, sulfonylureas. patients with sulfa allergies may develop fever, urinary tract infection, stevens-johnson syndrome, hemolytic anemia, thrombocytopenia, agranulocytosis, acute interstitial nephritis, and urticaria (hives). “medicine is a science of uncertainty and an art of probability.” “there are two kinds of statistics: the kind you look up and the kind you make up.” “on a long enough timeline, the survival rate for everyone drops to zero.” “there are three kinds of lies: lies, damned lies, and statistics.”</code> |
385
+ | <code>hla genotype association with type 1 diabetes susceptibility</code> | <code>fig. 15.38 population studies show association of susceptibility to type 1 diabetes with hla genotype. the hla genotypes (determined by serotyping) of patients with diabetes (lower panel) are not representative of those found in the general population (upper panel). almost all patients with diabetes express hla‑dr3 and/or hla‑dr4, and hla‑dr3/dr4 heterozygosity is greatly overrepresented in diabetics compared with controls. these alleles are linked tightly to hla‑dq alleles that confer susceptibility to type 1 diabetes. by contrast, hla‑dr2 protects against the development of diabetes and is found only extremely rarely in patients with diabetes. the small letter x represents any allele other than dr2, dr3, or dr4. family studies of hla haplotypes in type 1 diabetes</code> |
386
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
387
+ ```json
388
+ {
389
+ "scale": 20.0,
390
+ "similarity_fct": "cos_sim",
391
+ "mini_batch_size": 32
392
+ }
393
+ ```
394
+
395
+ ### Training Hyperparameters
396
+ #### Non-Default Hyperparameters
397
+
398
+ - `eval_strategy`: steps
399
+ - `per_device_train_batch_size`: 64
400
+ - `per_device_eval_batch_size`: 64
401
+ - `num_train_epochs`: 5
402
+ - `fp16`: True
403
+ - `multi_dataset_batch_sampler`: round_robin
404
+
405
+ #### All Hyperparameters
406
+ <details><summary>Click to expand</summary>
407
+
408
+ - `overwrite_output_dir`: False
409
+ - `do_predict`: False
410
+ - `eval_strategy`: steps
411
+ - `prediction_loss_only`: True
412
+ - `per_device_train_batch_size`: 64
413
+ - `per_device_eval_batch_size`: 64
414
+ - `per_gpu_train_batch_size`: None
415
+ - `per_gpu_eval_batch_size`: None
416
+ - `gradient_accumulation_steps`: 1
417
+ - `eval_accumulation_steps`: None
418
+ - `torch_empty_cache_steps`: None
419
+ - `learning_rate`: 5e-05
420
+ - `weight_decay`: 0.0
421
+ - `adam_beta1`: 0.9
422
+ - `adam_beta2`: 0.999
423
+ - `adam_epsilon`: 1e-08
424
+ - `max_grad_norm`: 1
425
+ - `num_train_epochs`: 5
426
+ - `max_steps`: -1
427
+ - `lr_scheduler_type`: linear
428
+ - `lr_scheduler_kwargs`: {}
429
+ - `warmup_ratio`: 0.0
430
+ - `warmup_steps`: 0
431
+ - `log_level`: passive
432
+ - `log_level_replica`: warning
433
+ - `log_on_each_node`: True
434
+ - `logging_nan_inf_filter`: True
435
+ - `save_safetensors`: True
436
+ - `save_on_each_node`: False
437
+ - `save_only_model`: False
438
+ - `restore_callback_states_from_checkpoint`: False
439
+ - `no_cuda`: False
440
+ - `use_cpu`: False
441
+ - `use_mps_device`: False
442
+ - `seed`: 42
443
+ - `data_seed`: None
444
+ - `jit_mode_eval`: False
445
+ - `use_ipex`: False
446
+ - `bf16`: False
447
+ - `fp16`: True
448
+ - `fp16_opt_level`: O1
449
+ - `half_precision_backend`: auto
450
+ - `bf16_full_eval`: False
451
+ - `fp16_full_eval`: False
452
+ - `tf32`: None
453
+ - `local_rank`: 0
454
+ - `ddp_backend`: None
455
+ - `tpu_num_cores`: None
456
+ - `tpu_metrics_debug`: False
457
+ - `debug`: []
458
+ - `dataloader_drop_last`: False
459
+ - `dataloader_num_workers`: 0
460
+ - `dataloader_prefetch_factor`: None
461
+ - `past_index`: -1
462
+ - `disable_tqdm`: False
463
+ - `remove_unused_columns`: True
464
+ - `label_names`: None
465
+ - `load_best_model_at_end`: False
466
+ - `ignore_data_skip`: False
467
+ - `fsdp`: []
468
+ - `fsdp_min_num_params`: 0
469
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
470
+ - `tp_size`: 0
471
+ - `fsdp_transformer_layer_cls_to_wrap`: None
472
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
473
+ - `deepspeed`: None
474
+ - `label_smoothing_factor`: 0.0
475
+ - `optim`: adamw_torch
476
+ - `optim_args`: None
477
+ - `adafactor`: False
478
+ - `group_by_length`: False
479
+ - `length_column_name`: length
480
+ - `ddp_find_unused_parameters`: None
481
+ - `ddp_bucket_cap_mb`: None
482
+ - `ddp_broadcast_buffers`: False
483
+ - `dataloader_pin_memory`: True
484
+ - `dataloader_persistent_workers`: False
485
+ - `skip_memory_metrics`: True
486
+ - `use_legacy_prediction_loop`: False
487
+ - `push_to_hub`: False
488
+ - `resume_from_checkpoint`: None
489
+ - `hub_model_id`: None
490
+ - `hub_strategy`: every_save
491
+ - `hub_private_repo`: None
492
+ - `hub_always_push`: False
493
+ - `gradient_checkpointing`: False
494
+ - `gradient_checkpointing_kwargs`: None
495
+ - `include_inputs_for_metrics`: False
496
+ - `include_for_metrics`: []
497
+ - `eval_do_concat_batches`: True
498
+ - `fp16_backend`: auto
499
+ - `push_to_hub_model_id`: None
500
+ - `push_to_hub_organization`: None
501
+ - `mp_parameters`:
502
+ - `auto_find_batch_size`: False
503
+ - `full_determinism`: False
504
+ - `torchdynamo`: None
505
+ - `ray_scope`: last
506
+ - `ddp_timeout`: 1800
507
+ - `torch_compile`: False
508
+ - `torch_compile_backend`: None
509
+ - `torch_compile_mode`: None
510
+ - `include_tokens_per_second`: False
511
+ - `include_num_input_tokens_seen`: False
512
+ - `neftune_noise_alpha`: None
513
+ - `optim_target_modules`: None
514
+ - `batch_eval_metrics`: False
515
+ - `eval_on_start`: False
516
+ - `use_liger_kernel`: False
517
+ - `eval_use_gather_object`: False
518
+ - `average_tokens_across_devices`: False
519
+ - `prompts`: None
520
+ - `batch_sampler`: batch_sampler
521
+ - `multi_dataset_batch_sampler`: round_robin
522
+
523
+ </details>
524
+
525
+ ### Training Logs
526
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
527
+ |:------:|:-----:|:-------------:|:--------------:|
528
+ | 0.0962 | 500 | 1.8469 | - |
529
+ | 0.1924 | 1000 | 0.301 | 0.5635 |
530
+ | 0.2886 | 1500 | 0.2186 | - |
531
+ | 0.3848 | 2000 | 0.1915 | 0.6145 |
532
+ | 0.4810 | 2500 | 0.1615 | - |
533
+ | 0.5771 | 3000 | 0.1504 | 0.6395 |
534
+ | 0.6733 | 3500 | 0.1451 | - |
535
+ | 0.7695 | 4000 | 0.1365 | 0.6568 |
536
+ | 0.8657 | 4500 | 0.1247 | - |
537
+ | 0.9619 | 5000 | 0.126 | 0.6666 |
538
+ | 1.0 | 5198 | - | 0.6692 |
539
+ | 1.0581 | 5500 | 0.1102 | - |
540
+ | 1.1543 | 6000 | 0.1075 | 0.6740 |
541
+ | 1.2505 | 6500 | 0.1025 | - |
542
+ | 1.3467 | 7000 | 0.1011 | 0.6782 |
543
+ | 1.4429 | 7500 | 0.099 | - |
544
+ | 1.5391 | 8000 | 0.0961 | 0.6903 |
545
+ | 1.6352 | 8500 | 0.0902 | - |
546
+ | 1.7314 | 9000 | 0.0914 | 0.6915 |
547
+ | 1.8276 | 9500 | 0.0894 | - |
548
+ | 1.9238 | 10000 | 0.0881 | 0.6972 |
549
+ | 2.0 | 10396 | - | 0.7002 |
550
+ | 2.0200 | 10500 | 0.0848 | - |
551
+ | 2.1162 | 11000 | 0.0779 | 0.7008 |
552
+ | 2.2124 | 11500 | 0.0756 | - |
553
+ | 2.3086 | 12000 | 0.075 | 0.7016 |
554
+ | 2.4048 | 12500 | 0.0785 | - |
555
+ | 2.5010 | 13000 | 0.0744 | 0.7027 |
556
+ | 2.5972 | 13500 | 0.0739 | - |
557
+ | 2.6933 | 14000 | 0.0741 | 0.7077 |
558
+ | 2.7895 | 14500 | 0.0704 | - |
559
+ | 2.8857 | 15000 | 0.074 | 0.7097 |
560
+ | 2.9819 | 15500 | 0.0696 | - |
561
+ | 3.0 | 15594 | - | 0.7127 |
562
+ | 3.0781 | 16000 | 0.0663 | 0.7135 |
563
+ | 3.1743 | 16500 | 0.0656 | - |
564
+ | 3.2705 | 17000 | 0.0634 | 0.7122 |
565
+ | 3.3667 | 17500 | 0.0639 | - |
566
+ | 3.4629 | 18000 | 0.0657 | 0.7159 |
567
+ | 3.5591 | 18500 | 0.0658 | - |
568
+ | 3.6553 | 19000 | 0.0627 | 0.7170 |
569
+ | 3.7514 | 19500 | 0.0648 | - |
570
+ | 3.8476 | 20000 | 0.0638 | 0.7166 |
571
+ | 3.9438 | 20500 | 0.0613 | - |
572
+ | 4.0 | 20792 | - | 0.7182 |
573
+ | 4.0400 | 21000 | 0.061 | 0.7171 |
574
+ | 4.1362 | 21500 | 0.0583 | - |
575
+ | 4.2324 | 22000 | 0.0602 | 0.7178 |
576
+ | 4.3286 | 22500 | 0.0599 | - |
577
+ | 4.4248 | 23000 | 0.0579 | 0.7185 |
578
+ | 4.5210 | 23500 | 0.0586 | - |
579
+ | 4.6172 | 24000 | 0.061 | 0.7181 |
580
+ | 4.7134 | 24500 | 0.0591 | - |
581
+ | 4.8095 | 25000 | 0.0568 | 0.7189 |
582
+ | 4.9057 | 25500 | 0.057 | - |
583
+
584
+
585
+ ### Framework Versions
586
+ - Python: 3.11.11
587
+ - Sentence Transformers: 4.1.0
588
+ - Transformers: 4.51.1
589
+ - PyTorch: 2.5.1+cu124
590
+ - Accelerate: 1.3.0
591
+ - Datasets: 3.5.0
592
+ - Tokenizers: 0.21.0
593
+
594
+ ## Citation
595
+
596
+ ### BibTeX
597
+
598
+ #### Sentence Transformers
599
+ ```bibtex
600
+ @inproceedings{reimers-2019-sentence-bert,
601
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
602
+ author = "Reimers, Nils and Gurevych, Iryna",
603
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
604
+ month = "11",
605
+ year = "2019",
606
+ publisher = "Association for Computational Linguistics",
607
+ url = "https://arxiv.org/abs/1908.10084",
608
+ }
609
+ ```
610
+
611
+ #### CachedMultipleNegativesRankingLoss
612
+ ```bibtex
613
+ @misc{gao2021scaling,
614
+ title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
615
+ author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
616
+ year={2021},
617
+ eprint={2101.06983},
618
+ archivePrefix={arXiv},
619
+ primaryClass={cs.LG}
620
+ }
621
+ ```
622
+
623
+ <!--
624
+ ## Glossary
625
+
626
+ *Clearly define terms in order to be accessible across audiences.*
627
+ -->
628
+
629
+ <!--
630
+ ## Model Card Authors
631
+
632
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
633
+ -->
634
+
635
+ <!--
636
+ ## Model Card Contact
637
+
638
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
639
+ -->
biencoder-checkpoints/checkpoint-tinybiobert/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adapters": {
3
+ "adapters": {},
4
+ "config_map": {},
5
+ "fusion_config_map": {},
6
+ "fusions": {}
7
+ },
8
+ "architectures": [
9
+ "BertModel"
10
+ ],
11
+ "attention_probs_dropout_prob": 0.1,
12
+ "cell": {},
13
+ "classifier_dropout": null,
14
+ "emb_size": 312,
15
+ "hidden_act": "gelu",
16
+ "hidden_dropout_prob": 0.1,
17
+ "hidden_size": 312,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 1200,
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 4,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "pre_trained": "",
28
+ "structure": [],
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.51.1",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 28996
34
+ }
biencoder-checkpoints/checkpoint-tinybiobert/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
biencoder-checkpoints/checkpoint-tinybiobert/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5f745c82f787b597535ef55de93eb76aca5f6b4e78d903ec53201809f82da99
3
+ size 55504328
biencoder-checkpoints/checkpoint-tinybiobert/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
biencoder-checkpoints/checkpoint-tinybiobert/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:758791b468924176dbc0c0cf74d95b3fd8358e802d4783f599c1882fedafdb93
3
+ size 14244
biencoder-checkpoints/checkpoint-tinybiobert/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3076546cf27c1d08da2eadd3cb8ac8adfe684c0e523d3df7772a350641740fef
3
+ size 988
biencoder-checkpoints/checkpoint-tinybiobert/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6c588a45dcdcf37531cdcf6d4fba104a3658213b98d027a148ce4cef470750e
3
+ size 1064
biencoder-checkpoints/checkpoint-tinybiobert/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
biencoder-checkpoints/checkpoint-tinybiobert/special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
biencoder-checkpoints/checkpoint-tinybiobert/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
biencoder-checkpoints/checkpoint-tinybiobert/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": false,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
biencoder-checkpoints/checkpoint-tinybiobert/trainer_state.json ADDED
@@ -0,0 +1,941 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 5.0,
6
+ "eval_steps": 1000,
7
+ "global_step": 25990,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.09619084263178146,
14
+ "grad_norm": 5.741635322570801,
15
+ "learning_rate": 9.980000000000001e-06,
16
+ "loss": 1.8469,
17
+ "step": 500
18
+ },
19
+ {
20
+ "epoch": 0.19238168526356292,
21
+ "grad_norm": 5.087311744689941,
22
+ "learning_rate": 1.9980000000000002e-05,
23
+ "loss": 0.301,
24
+ "step": 1000
25
+ },
26
+ {
27
+ "epoch": 0.19238168526356292,
28
+ "eval_cosine_accuracy@1": 0.4003615417611373,
29
+ "eval_cosine_accuracy@10": 0.7401495189656229,
30
+ "eval_cosine_accuracy@3": 0.5782523438936209,
31
+ "eval_cosine_accuracy@5": 0.6526441571174705,
32
+ "eval_cosine_map@100": 0.5158702186798423,
33
+ "eval_cosine_mrr@10": 0.5077505697419804,
34
+ "eval_cosine_ndcg@10": 0.5635344391210978,
35
+ "eval_cosine_precision@1": 0.4003615417611373,
36
+ "eval_cosine_precision@10": 0.07401495189656228,
37
+ "eval_cosine_precision@3": 0.19275078129787365,
38
+ "eval_cosine_precision@5": 0.1305288314234941,
39
+ "eval_cosine_recall@1": 0.4003615417611373,
40
+ "eval_cosine_recall@10": 0.7401495189656229,
41
+ "eval_cosine_recall@3": 0.5782523438936209,
42
+ "eval_cosine_recall@5": 0.6526441571174705,
43
+ "eval_runtime": 37.6057,
44
+ "eval_samples_per_second": 0.0,
45
+ "eval_steps_per_second": 0.0,
46
+ "step": 1000
47
+ },
48
+ {
49
+ "epoch": 0.2885725278953444,
50
+ "grad_norm": 2.9921863079071045,
51
+ "learning_rate": 1.9600640256102444e-05,
52
+ "loss": 0.2186,
53
+ "step": 1500
54
+ },
55
+ {
56
+ "epoch": 0.38476337052712584,
57
+ "grad_norm": 4.068124771118164,
58
+ "learning_rate": 1.9200480192076832e-05,
59
+ "loss": 0.1915,
60
+ "step": 2000
61
+ },
62
+ {
63
+ "epoch": 0.38476337052712584,
64
+ "eval_cosine_accuracy@1": 0.44365463570071695,
65
+ "eval_cosine_accuracy@10": 0.7951161223114162,
66
+ "eval_cosine_accuracy@3": 0.6363441387339911,
67
+ "eval_cosine_accuracy@5": 0.7084380170353576,
68
+ "eval_cosine_map@100": 0.5644053261135765,
69
+ "eval_cosine_mrr@10": 0.5572720521507379,
70
+ "eval_cosine_ndcg@10": 0.6145320896671304,
71
+ "eval_cosine_precision@1": 0.44365463570071695,
72
+ "eval_cosine_precision@10": 0.07951161223114162,
73
+ "eval_cosine_precision@3": 0.21211471291133036,
74
+ "eval_cosine_precision@5": 0.1416876034070715,
75
+ "eval_cosine_recall@1": 0.44365463570071695,
76
+ "eval_cosine_recall@10": 0.7951161223114162,
77
+ "eval_cosine_recall@3": 0.6363441387339911,
78
+ "eval_cosine_recall@5": 0.7084380170353576,
79
+ "eval_runtime": 37.3288,
80
+ "eval_samples_per_second": 0.0,
81
+ "eval_steps_per_second": 0.0,
82
+ "step": 2000
83
+ },
84
+ {
85
+ "epoch": 0.4809542131589073,
86
+ "grad_norm": 3.5689873695373535,
87
+ "learning_rate": 1.880032012805122e-05,
88
+ "loss": 0.1615,
89
+ "step": 2500
90
+ },
91
+ {
92
+ "epoch": 0.5771450557906888,
93
+ "grad_norm": 1.5700035095214844,
94
+ "learning_rate": 1.8400160064025612e-05,
95
+ "loss": 0.1504,
96
+ "step": 3000
97
+ },
98
+ {
99
+ "epoch": 0.5771450557906888,
100
+ "eval_cosine_accuracy@1": 0.46583736748575283,
101
+ "eval_cosine_accuracy@10": 0.8197193455481341,
102
+ "eval_cosine_accuracy@3": 0.6656045100802745,
103
+ "eval_cosine_accuracy@5": 0.7403946320240211,
104
+ "eval_cosine_map@100": 0.5887217637289748,
105
+ "eval_cosine_mrr@10": 0.5820531404138076,
106
+ "eval_cosine_ndcg@10": 0.6394519654007175,
107
+ "eval_cosine_precision@1": 0.46583736748575283,
108
+ "eval_cosine_precision@10": 0.0819719345548134,
109
+ "eval_cosine_precision@3": 0.22186817002675815,
110
+ "eval_cosine_precision@5": 0.14807892640480422,
111
+ "eval_cosine_recall@1": 0.46583736748575283,
112
+ "eval_cosine_recall@10": 0.8197193455481341,
113
+ "eval_cosine_recall@3": 0.6656045100802745,
114
+ "eval_cosine_recall@5": 0.7403946320240211,
115
+ "eval_runtime": 37.3165,
116
+ "eval_samples_per_second": 0.0,
117
+ "eval_steps_per_second": 0.0,
118
+ "step": 3000
119
+ },
120
+ {
121
+ "epoch": 0.6733358984224702,
122
+ "grad_norm": 2.3708999156951904,
123
+ "learning_rate": 1.8e-05,
124
+ "loss": 0.1451,
125
+ "step": 3500
126
+ },
127
+ {
128
+ "epoch": 0.7695267410542517,
129
+ "grad_norm": 2.4959521293640137,
130
+ "learning_rate": 1.7599839935974392e-05,
131
+ "loss": 0.1365,
132
+ "step": 4000
133
+ },
134
+ {
135
+ "epoch": 0.7695267410542517,
136
+ "eval_cosine_accuracy@1": 0.4831791163674245,
137
+ "eval_cosine_accuracy@10": 0.8363870335192107,
138
+ "eval_cosine_accuracy@3": 0.6838654329309394,
139
+ "eval_cosine_accuracy@5": 0.7581346896255898,
140
+ "eval_cosine_map@100": 0.6058547996468352,
141
+ "eval_cosine_mrr@10": 0.5995932655187352,
142
+ "eval_cosine_ndcg@10": 0.6568315628791503,
143
+ "eval_cosine_precision@1": 0.4831791163674245,
144
+ "eval_cosine_precision@10": 0.08363870335192108,
145
+ "eval_cosine_precision@3": 0.2279551443103131,
146
+ "eval_cosine_precision@5": 0.15162693792511794,
147
+ "eval_cosine_recall@1": 0.4831791163674245,
148
+ "eval_cosine_recall@10": 0.8363870335192107,
149
+ "eval_cosine_recall@3": 0.6838654329309394,
150
+ "eval_cosine_recall@5": 0.7581346896255898,
151
+ "eval_runtime": 37.9907,
152
+ "eval_samples_per_second": 0.0,
153
+ "eval_steps_per_second": 0.0,
154
+ "step": 4000
155
+ },
156
+ {
157
+ "epoch": 0.8657175836860331,
158
+ "grad_norm": 3.29150652885437,
159
+ "learning_rate": 1.720048019207683e-05,
160
+ "loss": 0.1247,
161
+ "step": 4500
162
+ },
163
+ {
164
+ "epoch": 0.9619084263178146,
165
+ "grad_norm": 1.8889851570129395,
166
+ "learning_rate": 1.6800320128051223e-05,
167
+ "loss": 0.126,
168
+ "step": 5000
169
+ },
170
+ {
171
+ "epoch": 0.9619084263178146,
172
+ "eval_cosine_accuracy@1": 0.49175807341136096,
173
+ "eval_cosine_accuracy@10": 0.846896255898033,
174
+ "eval_cosine_accuracy@3": 0.6944972118389607,
175
+ "eval_cosine_accuracy@5": 0.7665604510080275,
176
+ "eval_cosine_map@100": 0.6151092171416574,
177
+ "eval_cosine_mrr@10": 0.6091163941729371,
178
+ "eval_cosine_ndcg@10": 0.6665806678949275,
179
+ "eval_cosine_precision@1": 0.49175807341136096,
180
+ "eval_cosine_precision@10": 0.0846896255898033,
181
+ "eval_cosine_precision@3": 0.23149907061298688,
182
+ "eval_cosine_precision@5": 0.15331209020160547,
183
+ "eval_cosine_recall@1": 0.49175807341136096,
184
+ "eval_cosine_recall@10": 0.846896255898033,
185
+ "eval_cosine_recall@3": 0.6944972118389607,
186
+ "eval_cosine_recall@5": 0.7665604510080275,
187
+ "eval_runtime": 37.3193,
188
+ "eval_samples_per_second": 0.0,
189
+ "eval_steps_per_second": 0.0,
190
+ "step": 5000
191
+ },
192
+ {
193
+ "epoch": 1.058099268949596,
194
+ "grad_norm": 2.8940165042877197,
195
+ "learning_rate": 1.640016006402561e-05,
196
+ "loss": 0.1102,
197
+ "step": 5500
198
+ },
199
+ {
200
+ "epoch": 1.1542901115813775,
201
+ "grad_norm": 3.093466281890869,
202
+ "learning_rate": 1.6000000000000003e-05,
203
+ "loss": 0.1075,
204
+ "step": 6000
205
+ },
206
+ {
207
+ "epoch": 1.1542901115813775,
208
+ "eval_cosine_accuracy@1": 0.49788589987131565,
209
+ "eval_cosine_accuracy@10": 0.8543109259145781,
210
+ "eval_cosine_accuracy@3": 0.7028004166921993,
211
+ "eval_cosine_accuracy@5": 0.7760279428886574,
212
+ "eval_cosine_map@100": 0.6222303929011103,
213
+ "eval_cosine_mrr@10": 0.6164308912486005,
214
+ "eval_cosine_ndcg@10": 0.6739777946779848,
215
+ "eval_cosine_precision@1": 0.49788589987131565,
216
+ "eval_cosine_precision@10": 0.08543109259145781,
217
+ "eval_cosine_precision@3": 0.2342668055640664,
218
+ "eval_cosine_precision@5": 0.15520558857773148,
219
+ "eval_cosine_recall@1": 0.49788589987131565,
220
+ "eval_cosine_recall@10": 0.8543109259145781,
221
+ "eval_cosine_recall@3": 0.7028004166921993,
222
+ "eval_cosine_recall@5": 0.7760279428886574,
223
+ "eval_runtime": 38.0227,
224
+ "eval_samples_per_second": 0.0,
225
+ "eval_steps_per_second": 0.0,
226
+ "step": 6000
227
+ },
228
+ {
229
+ "epoch": 1.250480954213159,
230
+ "grad_norm": 1.681726098060608,
231
+ "learning_rate": 1.560064025610244e-05,
232
+ "loss": 0.1025,
233
+ "step": 6500
234
+ },
235
+ {
236
+ "epoch": 1.3466717968449404,
237
+ "grad_norm": 1.8751877546310425,
238
+ "learning_rate": 1.5200480192076832e-05,
239
+ "loss": 0.1011,
240
+ "step": 7000
241
+ },
242
+ {
243
+ "epoch": 1.3466717968449404,
244
+ "eval_cosine_accuracy@1": 0.5032477480237759,
245
+ "eval_cosine_accuracy@10": 0.8573442000122556,
246
+ "eval_cosine_accuracy@3": 0.707702677860163,
247
+ "eval_cosine_accuracy@5": 0.7806544518659232,
248
+ "eval_cosine_map@100": 0.6268520263379107,
249
+ "eval_cosine_mrr@10": 0.6210327794945518,
250
+ "eval_cosine_ndcg@10": 0.6782188559387655,
251
+ "eval_cosine_precision@1": 0.5032477480237759,
252
+ "eval_cosine_precision@10": 0.08573442000122555,
253
+ "eval_cosine_precision@3": 0.2359008926200543,
254
+ "eval_cosine_precision@5": 0.15613089037318464,
255
+ "eval_cosine_recall@1": 0.5032477480237759,
256
+ "eval_cosine_recall@10": 0.8573442000122556,
257
+ "eval_cosine_recall@3": 0.707702677860163,
258
+ "eval_cosine_recall@5": 0.7806544518659232,
259
+ "eval_runtime": 37.2837,
260
+ "eval_samples_per_second": 0.0,
261
+ "eval_steps_per_second": 0.0,
262
+ "step": 7000
263
+ },
264
+ {
265
+ "epoch": 1.4428626394767219,
266
+ "grad_norm": 1.1830039024353027,
267
+ "learning_rate": 1.4800320128051222e-05,
268
+ "loss": 0.099,
269
+ "step": 7500
270
+ },
271
+ {
272
+ "epoch": 1.5390534821085033,
273
+ "grad_norm": 2.8491501808166504,
274
+ "learning_rate": 1.4400160064025611e-05,
275
+ "loss": 0.0961,
276
+ "step": 8000
277
+ },
278
+ {
279
+ "epoch": 1.5390534821085033,
280
+ "eval_cosine_accuracy@1": 0.5155034009436853,
281
+ "eval_cosine_accuracy@10": 0.8688338746246707,
282
+ "eval_cosine_accuracy@3": 0.7188246828849807,
283
+ "eval_cosine_accuracy@5": 0.7922360438752375,
284
+ "eval_cosine_map@100": 0.6387374052710177,
285
+ "eval_cosine_mrr@10": 0.6333171505217987,
286
+ "eval_cosine_ndcg@10": 0.6903269438115052,
287
+ "eval_cosine_precision@1": 0.5155034009436853,
288
+ "eval_cosine_precision@10": 0.08688338746246706,
289
+ "eval_cosine_precision@3": 0.23960822762832687,
290
+ "eval_cosine_precision@5": 0.1584472087750475,
291
+ "eval_cosine_recall@1": 0.5155034009436853,
292
+ "eval_cosine_recall@10": 0.8688338746246707,
293
+ "eval_cosine_recall@3": 0.7188246828849807,
294
+ "eval_cosine_recall@5": 0.7922360438752375,
295
+ "eval_runtime": 37.1861,
296
+ "eval_samples_per_second": 0.0,
297
+ "eval_steps_per_second": 0.0,
298
+ "step": 8000
299
+ },
300
+ {
301
+ "epoch": 1.6352443247402846,
302
+ "grad_norm": 2.94862699508667,
303
+ "learning_rate": 1.4000800320128052e-05,
304
+ "loss": 0.0902,
305
+ "step": 8500
306
+ },
307
+ {
308
+ "epoch": 1.7314351673720663,
309
+ "grad_norm": 2.007293462753296,
310
+ "learning_rate": 1.3600640256102442e-05,
311
+ "loss": 0.0914,
312
+ "step": 9000
313
+ },
314
+ {
315
+ "epoch": 1.7314351673720663,
316
+ "eval_cosine_accuracy@1": 0.5171272749555733,
317
+ "eval_cosine_accuracy@10": 0.8689257920215699,
318
+ "eval_cosine_accuracy@3": 0.7210000612782645,
319
+ "eval_cosine_accuracy@5": 0.7930939395796311,
320
+ "eval_cosine_map@100": 0.6402035789072501,
321
+ "eval_cosine_mrr@10": 0.6347871236858099,
322
+ "eval_cosine_ndcg@10": 0.6914745561225969,
323
+ "eval_cosine_precision@1": 0.5171272749555733,
324
+ "eval_cosine_precision@10": 0.086892579202157,
325
+ "eval_cosine_precision@3": 0.2403333537594215,
326
+ "eval_cosine_precision@5": 0.15861878791592623,
327
+ "eval_cosine_recall@1": 0.5171272749555733,
328
+ "eval_cosine_recall@10": 0.8689257920215699,
329
+ "eval_cosine_recall@3": 0.7210000612782645,
330
+ "eval_cosine_recall@5": 0.7930939395796311,
331
+ "eval_runtime": 37.3089,
332
+ "eval_samples_per_second": 0.0,
333
+ "eval_steps_per_second": 0.0,
334
+ "step": 9000
335
+ },
336
+ {
337
+ "epoch": 1.8276260100038475,
338
+ "grad_norm": 1.6253125667572021,
339
+ "learning_rate": 1.3200480192076832e-05,
340
+ "loss": 0.0894,
341
+ "step": 9500
342
+ },
343
+ {
344
+ "epoch": 1.9238168526356292,
345
+ "grad_norm": 2.037698984146118,
346
+ "learning_rate": 1.2800320128051222e-05,
347
+ "loss": 0.0881,
348
+ "step": 10000
349
+ },
350
+ {
351
+ "epoch": 1.9238168526356292,
352
+ "eval_cosine_accuracy@1": 0.5226729579018322,
353
+ "eval_cosine_accuracy@10": 0.8739812488510326,
354
+ "eval_cosine_accuracy@3": 0.7287211226178074,
355
+ "eval_cosine_accuracy@5": 0.8007230835222746,
356
+ "eval_cosine_map@100": 0.6459750840989936,
357
+ "eval_cosine_mrr@10": 0.64070738218282,
358
+ "eval_cosine_ndcg@10": 0.6972440512806575,
359
+ "eval_cosine_precision@1": 0.5226729579018322,
360
+ "eval_cosine_precision@10": 0.08739812488510325,
361
+ "eval_cosine_precision@3": 0.24290704087260248,
362
+ "eval_cosine_precision@5": 0.16014461670445493,
363
+ "eval_cosine_recall@1": 0.5226729579018322,
364
+ "eval_cosine_recall@10": 0.8739812488510326,
365
+ "eval_cosine_recall@3": 0.7287211226178074,
366
+ "eval_cosine_recall@5": 0.8007230835222746,
367
+ "eval_runtime": 37.9487,
368
+ "eval_samples_per_second": 0.0,
369
+ "eval_steps_per_second": 0.0,
370
+ "step": 10000
371
+ },
372
+ {
373
+ "epoch": 2.0200076952674104,
374
+ "grad_norm": 1.377611517906189,
375
+ "learning_rate": 1.2400960384153663e-05,
376
+ "loss": 0.0848,
377
+ "step": 10500
378
+ },
379
+ {
380
+ "epoch": 2.116198537899192,
381
+ "grad_norm": 1.8393025398254395,
382
+ "learning_rate": 1.2000800320128053e-05,
383
+ "loss": 0.0779,
384
+ "step": 11000
385
+ },
386
+ {
387
+ "epoch": 2.116198537899192,
388
+ "eval_cosine_accuracy@1": 0.5261964581163061,
389
+ "eval_cosine_accuracy@10": 0.8776885838593051,
390
+ "eval_cosine_accuracy@3": 0.7318156749800846,
391
+ "eval_cosine_accuracy@5": 0.8031435749739567,
392
+ "eval_cosine_map@100": 0.6493668761542414,
393
+ "eval_cosine_mrr@10": 0.64421845652697,
394
+ "eval_cosine_ndcg@10": 0.7007940310013228,
395
+ "eval_cosine_precision@1": 0.5261964581163061,
396
+ "eval_cosine_precision@10": 0.0877688583859305,
397
+ "eval_cosine_precision@3": 0.24393855832669484,
398
+ "eval_cosine_precision@5": 0.16062871499479137,
399
+ "eval_cosine_recall@1": 0.5261964581163061,
400
+ "eval_cosine_recall@10": 0.8776885838593051,
401
+ "eval_cosine_recall@3": 0.7318156749800846,
402
+ "eval_cosine_recall@5": 0.8031435749739567,
403
+ "eval_runtime": 37.6496,
404
+ "eval_samples_per_second": 0.0,
405
+ "eval_steps_per_second": 0.0,
406
+ "step": 11000
407
+ },
408
+ {
409
+ "epoch": 2.2123893805309733,
410
+ "grad_norm": 2.0715949535369873,
411
+ "learning_rate": 1.1600640256102443e-05,
412
+ "loss": 0.0756,
413
+ "step": 11500
414
+ },
415
+ {
416
+ "epoch": 2.308580223162755,
417
+ "grad_norm": 3.181828498840332,
418
+ "learning_rate": 1.1200480192076833e-05,
419
+ "loss": 0.075,
420
+ "step": 12000
421
+ },
422
+ {
423
+ "epoch": 2.308580223162755,
424
+ "eval_cosine_accuracy@1": 0.5265028494393039,
425
+ "eval_cosine_accuracy@10": 0.8780256143146026,
426
+ "eval_cosine_accuracy@3": 0.7324897358906796,
427
+ "eval_cosine_accuracy@5": 0.8056253446902384,
428
+ "eval_cosine_map@100": 0.6502720334254667,
429
+ "eval_cosine_mrr@10": 0.6450791729768786,
430
+ "eval_cosine_ndcg@10": 0.7015823044400388,
431
+ "eval_cosine_precision@1": 0.5265028494393039,
432
+ "eval_cosine_precision@10": 0.08780256143146026,
433
+ "eval_cosine_precision@3": 0.24416324529689318,
434
+ "eval_cosine_precision@5": 0.16112506893804765,
435
+ "eval_cosine_recall@1": 0.5265028494393039,
436
+ "eval_cosine_recall@10": 0.8780256143146026,
437
+ "eval_cosine_recall@3": 0.7324897358906796,
438
+ "eval_cosine_recall@5": 0.8056253446902384,
439
+ "eval_runtime": 38.432,
440
+ "eval_samples_per_second": 0.0,
441
+ "eval_steps_per_second": 0.0,
442
+ "step": 12000
443
+ },
444
+ {
445
+ "epoch": 2.4047710657945363,
446
+ "grad_norm": 2.3642358779907227,
447
+ "learning_rate": 1.0801120448179271e-05,
448
+ "loss": 0.0785,
449
+ "step": 12500
450
+ },
451
+ {
452
+ "epoch": 2.500961908426318,
453
+ "grad_norm": 1.315789818763733,
454
+ "learning_rate": 1.0400960384153661e-05,
455
+ "loss": 0.0744,
456
+ "step": 13000
457
+ },
458
+ {
459
+ "epoch": 2.500961908426318,
460
+ "eval_cosine_accuracy@1": 0.5271769103498989,
461
+ "eval_cosine_accuracy@10": 0.880415466633985,
462
+ "eval_cosine_accuracy@3": 0.7335314663888719,
463
+ "eval_cosine_accuracy@5": 0.8069121882468289,
464
+ "eval_cosine_map@100": 0.6508831760712028,
465
+ "eval_cosine_mrr@10": 0.6458145729440025,
466
+ "eval_cosine_ndcg@10": 0.7026735805144717,
467
+ "eval_cosine_precision@1": 0.5271769103498989,
468
+ "eval_cosine_precision@10": 0.08804154666339849,
469
+ "eval_cosine_precision@3": 0.2445104887962906,
470
+ "eval_cosine_precision@5": 0.16138243764936577,
471
+ "eval_cosine_recall@1": 0.5271769103498989,
472
+ "eval_cosine_recall@10": 0.880415466633985,
473
+ "eval_cosine_recall@3": 0.7335314663888719,
474
+ "eval_cosine_recall@5": 0.8069121882468289,
475
+ "eval_runtime": 37.3573,
476
+ "eval_samples_per_second": 0.0,
477
+ "eval_steps_per_second": 0.0,
478
+ "step": 13000
479
+ },
480
+ {
481
+ "epoch": 2.597152751058099,
482
+ "grad_norm": 2.1482696533203125,
483
+ "learning_rate": 1.0000800320128053e-05,
484
+ "loss": 0.0739,
485
+ "step": 13500
486
+ },
487
+ {
488
+ "epoch": 2.693343593689881,
489
+ "grad_norm": 1.6254030466079712,
490
+ "learning_rate": 9.600640256102441e-06,
491
+ "loss": 0.0741,
492
+ "step": 14000
493
+ },
494
+ {
495
+ "epoch": 2.693343593689881,
496
+ "eval_cosine_accuracy@1": 0.5324468411054599,
497
+ "eval_cosine_accuracy@10": 0.8847968625528525,
498
+ "eval_cosine_accuracy@3": 0.7399044059072247,
499
+ "eval_cosine_accuracy@5": 0.811079110239598,
500
+ "eval_cosine_map@100": 0.6559858785481972,
501
+ "eval_cosine_mrr@10": 0.6510460710419446,
502
+ "eval_cosine_ndcg@10": 0.7077137677382884,
503
+ "eval_cosine_precision@1": 0.5324468411054599,
504
+ "eval_cosine_precision@10": 0.08847968625528524,
505
+ "eval_cosine_precision@3": 0.2466348019690749,
506
+ "eval_cosine_precision@5": 0.16221582204791962,
507
+ "eval_cosine_recall@1": 0.5324468411054599,
508
+ "eval_cosine_recall@10": 0.8847968625528525,
509
+ "eval_cosine_recall@3": 0.7399044059072247,
510
+ "eval_cosine_recall@5": 0.811079110239598,
511
+ "eval_runtime": 37.4282,
512
+ "eval_samples_per_second": 0.0,
513
+ "eval_steps_per_second": 0.0,
514
+ "step": 14000
515
+ },
516
+ {
517
+ "epoch": 2.789534436321662,
518
+ "grad_norm": 1.0776854753494263,
519
+ "learning_rate": 9.201280512204884e-06,
520
+ "loss": 0.0704,
521
+ "step": 14500
522
+ },
523
+ {
524
+ "epoch": 2.8857252789534438,
525
+ "grad_norm": 2.2815675735473633,
526
+ "learning_rate": 8.801120448179272e-06,
527
+ "loss": 0.074,
528
+ "step": 15000
529
+ },
530
+ {
531
+ "epoch": 2.8857252789534438,
532
+ "eval_cosine_accuracy@1": 0.5350511673509406,
533
+ "eval_cosine_accuracy@10": 0.8862369017709418,
534
+ "eval_cosine_accuracy@3": 0.7405784668178197,
535
+ "eval_cosine_accuracy@5": 0.813101292971383,
536
+ "eval_cosine_map@100": 0.658100408637169,
537
+ "eval_cosine_mrr@10": 0.6532255589696401,
538
+ "eval_cosine_ndcg@10": 0.7097064138010793,
539
+ "eval_cosine_precision@1": 0.5350511673509406,
540
+ "eval_cosine_precision@10": 0.08862369017709419,
541
+ "eval_cosine_precision@3": 0.24685948893927323,
542
+ "eval_cosine_precision@5": 0.16262025859427662,
543
+ "eval_cosine_recall@1": 0.5350511673509406,
544
+ "eval_cosine_recall@10": 0.8862369017709418,
545
+ "eval_cosine_recall@3": 0.7405784668178197,
546
+ "eval_cosine_recall@5": 0.813101292971383,
547
+ "eval_runtime": 37.393,
548
+ "eval_samples_per_second": 0.0,
549
+ "eval_steps_per_second": 0.0,
550
+ "step": 15000
551
+ },
552
+ {
553
+ "epoch": 2.981916121585225,
554
+ "grad_norm": 1.556275486946106,
555
+ "learning_rate": 8.400960384153662e-06,
556
+ "loss": 0.0696,
557
+ "step": 15500
558
+ },
559
+ {
560
+ "epoch": 3.0781069642170067,
561
+ "grad_norm": 2.5441489219665527,
562
+ "learning_rate": 8.000800320128052e-06,
563
+ "loss": 0.0663,
564
+ "step": 16000
565
+ },
566
+ {
567
+ "epoch": 3.0781069642170067,
568
+ "eval_cosine_accuracy@1": 0.5396163980636068,
569
+ "eval_cosine_accuracy@10": 0.8885654758257246,
570
+ "eval_cosine_accuracy@3": 0.7458790367056805,
571
+ "eval_cosine_accuracy@5": 0.8164409583920583,
572
+ "eval_cosine_map@100": 0.6622726196460629,
573
+ "eval_cosine_mrr@10": 0.6574670605011067,
574
+ "eval_cosine_ndcg@10": 0.7135277681055804,
575
+ "eval_cosine_precision@1": 0.5396163980636068,
576
+ "eval_cosine_precision@10": 0.08885654758257244,
577
+ "eval_cosine_precision@3": 0.24862634556856014,
578
+ "eval_cosine_precision@5": 0.1632881916784117,
579
+ "eval_cosine_recall@1": 0.5396163980636068,
580
+ "eval_cosine_recall@10": 0.8885654758257246,
581
+ "eval_cosine_recall@3": 0.7458790367056805,
582
+ "eval_cosine_recall@5": 0.8164409583920583,
583
+ "eval_runtime": 37.3857,
584
+ "eval_samples_per_second": 0.0,
585
+ "eval_steps_per_second": 0.0,
586
+ "step": 16000
587
+ },
588
+ {
589
+ "epoch": 3.174297806848788,
590
+ "grad_norm": 0.8971004486083984,
591
+ "learning_rate": 7.600640256102442e-06,
592
+ "loss": 0.0656,
593
+ "step": 16500
594
+ },
595
+ {
596
+ "epoch": 3.2704886494805696,
597
+ "grad_norm": 1.499013900756836,
598
+ "learning_rate": 7.2012805122048825e-06,
599
+ "loss": 0.0634,
600
+ "step": 17000
601
+ },
602
+ {
603
+ "epoch": 3.2704886494805696,
604
+ "eval_cosine_accuracy@1": 0.537441019670323,
605
+ "eval_cosine_accuracy@10": 0.8884122801642258,
606
+ "eval_cosine_accuracy@3": 0.7433972669893989,
607
+ "eval_cosine_accuracy@5": 0.8161039279367608,
608
+ "eval_cosine_map@100": 0.6605980760721326,
609
+ "eval_cosine_mrr@10": 0.6558092645927442,
610
+ "eval_cosine_ndcg@10": 0.7122137222786563,
611
+ "eval_cosine_precision@1": 0.537441019670323,
612
+ "eval_cosine_precision@10": 0.08884122801642257,
613
+ "eval_cosine_precision@3": 0.24779908899646627,
614
+ "eval_cosine_precision@5": 0.1632207855873522,
615
+ "eval_cosine_recall@1": 0.537441019670323,
616
+ "eval_cosine_recall@10": 0.8884122801642258,
617
+ "eval_cosine_recall@3": 0.7433972669893989,
618
+ "eval_cosine_recall@5": 0.8161039279367608,
619
+ "eval_runtime": 37.408,
620
+ "eval_samples_per_second": 0.0,
621
+ "eval_steps_per_second": 0.0,
622
+ "step": 17000
623
+ },
624
+ {
625
+ "epoch": 3.366679492112351,
626
+ "grad_norm": 0.85924232006073,
627
+ "learning_rate": 6.8011204481792725e-06,
628
+ "loss": 0.0639,
629
+ "step": 17500
630
+ },
631
+ {
632
+ "epoch": 3.4628703347441325,
633
+ "grad_norm": 1.6038638353347778,
634
+ "learning_rate": 6.400960384153662e-06,
635
+ "loss": 0.0657,
636
+ "step": 18000
637
+ },
638
+ {
639
+ "epoch": 3.4628703347441325,
640
+ "eval_cosine_accuracy@1": 0.5418224155891905,
641
+ "eval_cosine_accuracy@10": 0.8918132238495006,
642
+ "eval_cosine_accuracy@3": 0.7472271585268705,
643
+ "eval_cosine_accuracy@5": 0.8188308107114407,
644
+ "eval_cosine_map@100": 0.6642658993896512,
645
+ "eval_cosine_mrr@10": 0.6595858780834957,
646
+ "eval_cosine_ndcg@10": 0.7158873591426906,
647
+ "eval_cosine_precision@1": 0.5418224155891905,
648
+ "eval_cosine_precision@10": 0.08918132238495005,
649
+ "eval_cosine_precision@3": 0.2490757195089568,
650
+ "eval_cosine_precision@5": 0.16376616214228812,
651
+ "eval_cosine_recall@1": 0.5418224155891905,
652
+ "eval_cosine_recall@10": 0.8918132238495006,
653
+ "eval_cosine_recall@3": 0.7472271585268705,
654
+ "eval_cosine_recall@5": 0.8188308107114407,
655
+ "eval_runtime": 38.4035,
656
+ "eval_samples_per_second": 0.0,
657
+ "eval_steps_per_second": 0.0,
658
+ "step": 18000
659
+ },
660
+ {
661
+ "epoch": 3.5590611773759138,
662
+ "grad_norm": 1.283136010169983,
663
+ "learning_rate": 6.000800320128052e-06,
664
+ "loss": 0.0658,
665
+ "step": 18500
666
+ },
667
+ {
668
+ "epoch": 3.655252020007695,
669
+ "grad_norm": 0.8852151036262512,
670
+ "learning_rate": 5.600640256102441e-06,
671
+ "loss": 0.0627,
672
+ "step": 19000
673
+ },
674
+ {
675
+ "epoch": 3.655252020007695,
676
+ "eval_cosine_accuracy@1": 0.5437220417917764,
677
+ "eval_cosine_accuracy@10": 0.8912617194681046,
678
+ "eval_cosine_accuracy@3": 0.7493106195232551,
679
+ "eval_cosine_accuracy@5": 0.820393406458729,
680
+ "eval_cosine_map@100": 0.6659324483571365,
681
+ "eval_cosine_mrr@10": 0.661192231861396,
682
+ "eval_cosine_ndcg@10": 0.7170066447079718,
683
+ "eval_cosine_precision@1": 0.5437220417917764,
684
+ "eval_cosine_precision@10": 0.08912617194681045,
685
+ "eval_cosine_precision@3": 0.24977020650775167,
686
+ "eval_cosine_precision@5": 0.16407868129174585,
687
+ "eval_cosine_recall@1": 0.5437220417917764,
688
+ "eval_cosine_recall@10": 0.8912617194681046,
689
+ "eval_cosine_recall@3": 0.7493106195232551,
690
+ "eval_cosine_recall@5": 0.820393406458729,
691
+ "eval_runtime": 37.4746,
692
+ "eval_samples_per_second": 0.0,
693
+ "eval_steps_per_second": 0.0,
694
+ "step": 19000
695
+ },
696
+ {
697
+ "epoch": 3.7514428626394767,
698
+ "grad_norm": 2.4619545936584473,
699
+ "learning_rate": 5.200480192076831e-06,
700
+ "loss": 0.0648,
701
+ "step": 19500
702
+ },
703
+ {
704
+ "epoch": 3.8476337052712584,
705
+ "grad_norm": 1.4234368801116943,
706
+ "learning_rate": 4.801120448179272e-06,
707
+ "loss": 0.0638,
708
+ "step": 20000
709
+ },
710
+ {
711
+ "epoch": 3.8476337052712584,
712
+ "eval_cosine_accuracy@1": 0.5417917764568907,
713
+ "eval_cosine_accuracy@10": 0.8925792021569949,
714
+ "eval_cosine_accuracy@3": 0.7492799803909553,
715
+ "eval_cosine_accuracy@5": 0.8206997977817269,
716
+ "eval_cosine_map@100": 0.6648556804097842,
717
+ "eval_cosine_mrr@10": 0.6602137736030805,
718
+ "eval_cosine_ndcg@10": 0.7165749273596944,
719
+ "eval_cosine_precision@1": 0.5417917764568907,
720
+ "eval_cosine_precision@10": 0.08925792021569949,
721
+ "eval_cosine_precision@3": 0.24975999346365177,
722
+ "eval_cosine_precision@5": 0.16413995955634536,
723
+ "eval_cosine_recall@1": 0.5417917764568907,
724
+ "eval_cosine_recall@10": 0.8925792021569949,
725
+ "eval_cosine_recall@3": 0.7492799803909553,
726
+ "eval_cosine_recall@5": 0.8206997977817269,
727
+ "eval_runtime": 37.3088,
728
+ "eval_samples_per_second": 0.0,
729
+ "eval_steps_per_second": 0.0,
730
+ "step": 20000
731
+ },
732
+ {
733
+ "epoch": 3.9438245479030396,
734
+ "grad_norm": 2.059201240539551,
735
+ "learning_rate": 4.400960384153662e-06,
736
+ "loss": 0.0613,
737
+ "step": 20500
738
+ },
739
+ {
740
+ "epoch": 4.040015390534821,
741
+ "grad_norm": 2.8143606185913086,
742
+ "learning_rate": 4.000800320128051e-06,
743
+ "loss": 0.061,
744
+ "step": 21000
745
+ },
746
+ {
747
+ "epoch": 4.040015390534821,
748
+ "eval_cosine_accuracy@1": 0.5425577547643851,
749
+ "eval_cosine_accuracy@10": 0.8922115325693977,
750
+ "eval_cosine_accuracy@3": 0.7497395673754519,
751
+ "eval_cosine_accuracy@5": 0.8215270543538207,
752
+ "eval_cosine_map@100": 0.6656333161549883,
753
+ "eval_cosine_mrr@10": 0.6609160054936519,
754
+ "eval_cosine_ndcg@10": 0.7170515802988034,
755
+ "eval_cosine_precision@1": 0.5425577547643851,
756
+ "eval_cosine_precision@10": 0.08922115325693976,
757
+ "eval_cosine_precision@3": 0.24991318912515062,
758
+ "eval_cosine_precision@5": 0.16430541087076414,
759
+ "eval_cosine_recall@1": 0.5425577547643851,
760
+ "eval_cosine_recall@10": 0.8922115325693977,
761
+ "eval_cosine_recall@3": 0.7497395673754519,
762
+ "eval_cosine_recall@5": 0.8215270543538207,
763
+ "eval_runtime": 38.0609,
764
+ "eval_samples_per_second": 0.0,
765
+ "eval_steps_per_second": 0.0,
766
+ "step": 21000
767
+ },
768
+ {
769
+ "epoch": 4.136206233166603,
770
+ "grad_norm": 1.6916602849960327,
771
+ "learning_rate": 3.6006402561024412e-06,
772
+ "loss": 0.0583,
773
+ "step": 21500
774
+ },
775
+ {
776
+ "epoch": 4.232397075798384,
777
+ "grad_norm": 2.4838967323303223,
778
+ "learning_rate": 3.2012805122048822e-06,
779
+ "loss": 0.0602,
780
+ "step": 22000
781
+ },
782
+ {
783
+ "epoch": 4.232397075798384,
784
+ "eval_cosine_accuracy@1": 0.5435688461302776,
785
+ "eval_cosine_accuracy@10": 0.8929775108768919,
786
+ "eval_cosine_accuracy@3": 0.7499540413015503,
787
+ "eval_cosine_accuracy@5": 0.8217415282799191,
788
+ "eval_cosine_map@100": 0.6663101850931815,
789
+ "eval_cosine_mrr@10": 0.6616394902426588,
790
+ "eval_cosine_ndcg@10": 0.7177557535908854,
791
+ "eval_cosine_precision@1": 0.5435688461302776,
792
+ "eval_cosine_precision@10": 0.0892977510876892,
793
+ "eval_cosine_precision@3": 0.24998468043385008,
794
+ "eval_cosine_precision@5": 0.16434830565598385,
795
+ "eval_cosine_recall@1": 0.5435688461302776,
796
+ "eval_cosine_recall@10": 0.8929775108768919,
797
+ "eval_cosine_recall@3": 0.7499540413015503,
798
+ "eval_cosine_recall@5": 0.8217415282799191,
799
+ "eval_runtime": 37.3184,
800
+ "eval_samples_per_second": 0.0,
801
+ "eval_steps_per_second": 0.0,
802
+ "step": 22000
803
+ },
804
+ {
805
+ "epoch": 4.328587918430165,
806
+ "grad_norm": 1.5632020235061646,
807
+ "learning_rate": 2.8011204481792718e-06,
808
+ "loss": 0.0599,
809
+ "step": 22500
810
+ },
811
+ {
812
+ "epoch": 4.424778761061947,
813
+ "grad_norm": 1.0466374158859253,
814
+ "learning_rate": 2.4009603841536618e-06,
815
+ "loss": 0.0579,
816
+ "step": 23000
817
+ },
818
+ {
819
+ "epoch": 4.424778761061947,
820
+ "eval_cosine_accuracy@1": 0.5439977939824744,
821
+ "eval_cosine_accuracy@10": 0.8934370978613886,
822
+ "eval_cosine_accuracy@3": 0.7516391935780379,
823
+ "eval_cosine_accuracy@5": 0.8220479196029169,
824
+ "eval_cosine_map@100": 0.6671602335542375,
825
+ "eval_cosine_mrr@10": 0.6625040609008773,
826
+ "eval_cosine_ndcg@10": 0.7185422064601371,
827
+ "eval_cosine_precision@1": 0.5439977939824744,
828
+ "eval_cosine_precision@10": 0.08934370978613886,
829
+ "eval_cosine_precision@3": 0.2505463978593459,
830
+ "eval_cosine_precision@5": 0.16440958392058336,
831
+ "eval_cosine_recall@1": 0.5439977939824744,
832
+ "eval_cosine_recall@10": 0.8934370978613886,
833
+ "eval_cosine_recall@3": 0.7516391935780379,
834
+ "eval_cosine_recall@5": 0.8220479196029169,
835
+ "eval_runtime": 37.6028,
836
+ "eval_samples_per_second": 0.0,
837
+ "eval_steps_per_second": 0.0,
838
+ "step": 23000
839
+ },
840
+ {
841
+ "epoch": 4.520969603693729,
842
+ "grad_norm": 2.3420305252075195,
843
+ "learning_rate": 2.0008003201280513e-06,
844
+ "loss": 0.0586,
845
+ "step": 23500
846
+ },
847
+ {
848
+ "epoch": 4.61716044632551,
849
+ "grad_norm": 1.0525851249694824,
850
+ "learning_rate": 1.6006402561024411e-06,
851
+ "loss": 0.061,
852
+ "step": 24000
853
+ },
854
+ {
855
+ "epoch": 4.61716044632551,
856
+ "eval_cosine_accuracy@1": 0.5431398982780807,
857
+ "eval_cosine_accuracy@10": 0.8937741283166861,
858
+ "eval_cosine_accuracy@3": 0.750413628286047,
859
+ "eval_cosine_accuracy@5": 0.8224462283228139,
860
+ "eval_cosine_map@100": 0.6664220156098503,
861
+ "eval_cosine_mrr@10": 0.6617887952206918,
862
+ "eval_cosine_ndcg@10": 0.71806761340154,
863
+ "eval_cosine_precision@1": 0.5431398982780807,
864
+ "eval_cosine_precision@10": 0.0893774128316686,
865
+ "eval_cosine_precision@3": 0.250137876095349,
866
+ "eval_cosine_precision@5": 0.1644892456645628,
867
+ "eval_cosine_recall@1": 0.5431398982780807,
868
+ "eval_cosine_recall@10": 0.8937741283166861,
869
+ "eval_cosine_recall@3": 0.750413628286047,
870
+ "eval_cosine_recall@5": 0.8224462283228139,
871
+ "eval_runtime": 38.033,
872
+ "eval_samples_per_second": 0.0,
873
+ "eval_steps_per_second": 0.0,
874
+ "step": 24000
875
+ },
876
+ {
877
+ "epoch": 4.713351288957291,
878
+ "grad_norm": 0.856259286403656,
879
+ "learning_rate": 1.201280512204882e-06,
880
+ "loss": 0.0591,
881
+ "step": 24500
882
+ },
883
+ {
884
+ "epoch": 4.8095421315890725,
885
+ "grad_norm": 2.3736488819122314,
886
+ "learning_rate": 8.011204481792719e-07,
887
+ "loss": 0.0568,
888
+ "step": 25000
889
+ },
890
+ {
891
+ "epoch": 4.8095421315890725,
892
+ "eval_cosine_accuracy@1": 0.5447024940253692,
893
+ "eval_cosine_accuracy@10": 0.893927323978185,
894
+ "eval_cosine_accuracy@3": 0.751761750107237,
895
+ "eval_cosine_accuracy@5": 0.8228138979104112,
896
+ "eval_cosine_map@100": 0.6674861654698347,
897
+ "eval_cosine_mrr@10": 0.6628477055180666,
898
+ "eval_cosine_ndcg@10": 0.7189132708892791,
899
+ "eval_cosine_precision@1": 0.5447024940253692,
900
+ "eval_cosine_precision@10": 0.08939273239781849,
901
+ "eval_cosine_precision@3": 0.2505872500357456,
902
+ "eval_cosine_precision@5": 0.16456277958208224,
903
+ "eval_cosine_recall@1": 0.5447024940253692,
904
+ "eval_cosine_recall@10": 0.893927323978185,
905
+ "eval_cosine_recall@3": 0.751761750107237,
906
+ "eval_cosine_recall@5": 0.8228138979104112,
907
+ "eval_runtime": 37.6063,
908
+ "eval_samples_per_second": 0.0,
909
+ "eval_steps_per_second": 0.0,
910
+ "step": 25000
911
+ },
912
+ {
913
+ "epoch": 4.905732974220854,
914
+ "grad_norm": 0.7087424397468567,
915
+ "learning_rate": 4.009603841536615e-07,
916
+ "loss": 0.057,
917
+ "step": 25500
918
+ }
919
+ ],
920
+ "logging_steps": 500,
921
+ "max_steps": 25990,
922
+ "num_input_tokens_seen": 0,
923
+ "num_train_epochs": 5,
924
+ "save_steps": 1000,
925
+ "stateful_callbacks": {
926
+ "TrainerControl": {
927
+ "args": {
928
+ "should_epoch_stop": false,
929
+ "should_evaluate": false,
930
+ "should_log": false,
931
+ "should_save": true,
932
+ "should_training_stop": true
933
+ },
934
+ "attributes": {}
935
+ }
936
+ },
937
+ "total_flos": 0.0,
938
+ "train_batch_size": 64,
939
+ "trial_name": null,
940
+ "trial_params": null
941
+ }
biencoder-checkpoints/checkpoint-tinybiobert/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2583d3517ab53755d75547370ad94b89e3ea203066a5969526881be0e9c58c83
3
+ size 5560
biencoder-checkpoints/checkpoint-tinybiobert/vocab.txt ADDED
The diff for this file is too large to render. See raw diff