Added tinybiobert based biencoder

Browse files

Files changed (16) hide show

biencoder-checkpoints/checkpoint-tinybiobert/1_Pooling/config.json +10 -0
biencoder-checkpoints/checkpoint-tinybiobert/README.md +639 -0
biencoder-checkpoints/checkpoint-tinybiobert/config.json +34 -0
biencoder-checkpoints/checkpoint-tinybiobert/config_sentence_transformers.json +10 -0
biencoder-checkpoints/checkpoint-tinybiobert/model.safetensors +3 -0
biencoder-checkpoints/checkpoint-tinybiobert/modules.json +14 -0
biencoder-checkpoints/checkpoint-tinybiobert/rng_state.pth +3 -0
biencoder-checkpoints/checkpoint-tinybiobert/scaler.pt +3 -0
biencoder-checkpoints/checkpoint-tinybiobert/scheduler.pt +3 -0
biencoder-checkpoints/checkpoint-tinybiobert/sentence_bert_config.json +4 -0
biencoder-checkpoints/checkpoint-tinybiobert/special_tokens_map.json +37 -0
biencoder-checkpoints/checkpoint-tinybiobert/tokenizer.json +0 -0
biencoder-checkpoints/checkpoint-tinybiobert/tokenizer_config.json +56 -0
biencoder-checkpoints/checkpoint-tinybiobert/trainer_state.json +941 -0
biencoder-checkpoints/checkpoint-tinybiobert/training_args.bin +3 -0
biencoder-checkpoints/checkpoint-tinybiobert/vocab.txt +0 -0

biencoder-checkpoints/checkpoint-tinybiobert/1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 312,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

biencoder-checkpoints/checkpoint-tinybiobert/README.md ADDED Viewed

	@@ -0,0 +1,639 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:332672
+- loss:CachedMultipleNegativesRankingLoss
+base_model: nlpie/tiny-biobert
+widget:
+- source_sentence: mechanism of primordial follicle growth initiation
+  sentences:
+  - at first sight, rna splicing enables eukaryotes to increase the coding potential
+    of their genomes. we shall return to this idea again in this chapter and the next,
+    but we first need to describe the cellular machinery that performs this remarkable
+    task. o(a) (b) 5˜5˜aoh aho 3˜5˜exon sequence intron sequence 3˜exon 2˜
+  - pituitary gonadotropins maintain a normal ovarian reserve by promoting the general
+    health of the ovary. however, the rate at which resting primordial follicles enter
+    the growth process appears to be independent of pituitary gonadotropins. the decision
+    of a resting follicle to enter the early growth phase is primarily dependent on
+    intraovarian paracrine factors produced by both the follicle cells and oocytes.
+    the gamete in primordial follicles the gamete is derived from oogonia that have
+    entered the first meiotic division; such oogonia are referred to as primary oocytes.
+    primary oocytes progress through most of prophase of the first meiotic division
+  - primary sclerosing cholangitis (psc) is a disorder characterized by both intrahepatic
+    and extrahepatic bile duct inflammation and fibrosis, frequently leading to biliary
+    cirrhosis and hepatic failure; approximately 5% of patients with uc have psc,
+    but 50–75% of patients with psc have ibd. psc occurs less often in patients with
+    cd. although it can be recognized after the diagnosis of ibd, psc can be detected
+    earlier or even years after proctocolectomy. consistent with this, the immunogenetic
+    basis for psc appears to be overlapping but distinct from uc based on gwas, although
+    both ibd and psc are commonly panca positive. most patients have no symptoms at
+    the time of diagnosis; when symptoms are present, they consist of fatigue, jaundice,
+    abdominal pain, fever, anorexia, and malaise. the traditional gold standard diagnostic
+    test is endoscopic retrograde cholangiopancreatography (ercp), but magnetic resonance
+    cholangiopancreatography (mrcp) is also sensitive and specific. mrcp is
+- source_sentence: naming and numbering system for fatty acid carbons
+  sentences:
+  - 'and meta-analysis examining the impact of incision on outcomes after abdominal
+    surgery. am j surg. 2013;206:400-409. doi: 10.1016/j.amjsurg.2012.11.008bilsel
+    y, abci i. the search for ideal hernia repair; mesh materi-als and types. int
+    j surg. 2012;10:317-321. doi: 10.1016/j.ijsu.2012.05.002brown sr, tiernan j. transverse
+    verses midline incisions for abdom-inal surgery. cochrane database syst rev. 2005;(4):cd005199.
+    doi: 10.1002/14651858.cd005199.pub2caro-tarrago a, olona casas c, jimenez salido
+    a, et al. prevention of incisional hernia in midline laparotomy with an onlay
+    mesh: a randomized clinical trial. world j surg. 2014;38:2223-2230. doi: 10.1007/s00268-014-2510-6conze
+    j, kingsnorth an, flament jb, et al. randomized clinical trial comparing lightweight
+    composite mesh with polyester or polypropylene mesh for incisional hernia repair.
+    br j surg. 2005;92:1488-1493. doi: 10.1002/bjs.5208de vries reilingh ts, van goor
+    h, rosman c, et al. “compo-nents separation technique” for the'
+  - this entailed a risk of serious injury. a similar syndrome in malaysia and indonesia
+    is known as latah and in siberia as miryachit. this syndrome has been framed in
+    psychologic terms as conditioned responses (saint-hilaire et al) or as culturally
+    determined behavior (simons). possibly some of the complex secondary phenomena
+    can be explained in this way, but the stereotyped onset with an uncontrollable
+    startle and the familial occurrence attest to a biologic basis. the most common
+    mutation is in the 1-subunit of the inhibitory glycine receptor glra1 (shiang
+    et al) but other glycine receptor–related genes have been implicated in other
+    cases. as pointed out by suhren and associates and by kurczynski, the condition
+    is transmitted in some families as an autosomal dominant trait. the subject has
+    been reviewed by wilkins and colleagues and by ryan and associates.
+  - 'the common names and structures of some fatty acids of physiologic importance
+    are listed in figure 16.4. in humans, fatty acids with an even number of carbon
+    atoms (16, 18, or 20) predominate, with longer fatty acids (>22 carbons) being
+    found in the brain. the carbon atoms are numbered, beginning with the carbonyl
+    carbon as carbon 1. the number before the colon indicates the number of carbons
+    in the chain, and those after the colon indicate the numbers and positions (relative
+    to the carboxyl end) of double bonds. for example, as denoted in figure 16.4,
+    arachidonic acid, 20:4(5,8,11,14), is 20 carbons long and has four double bonds
+    (between carbons 5–6, 8–9, 11–12, and 14–15). [note: carbon 2, the carbon to which
+    the carboxyl group is attached, is also called the α-carbon, carbon 3 is the βcarbon,
+    and carbon 4 is the γ-carbon. the carbon of the terminal methyl group is called
+    the ω-carbon regardless of the chain length.] the double bonds in a fatty acid
+    can also be referenced relative'
+- source_sentence: how does the extent of disease at the start of androgen depletion
+    therapy relate to prognosis?
+  sentences:
+  - (c) localization of the mre11 complex to damaged dna as visualized by antibodies
+    against the mre11 subunit (red). mre11 is a nuclease that processes damaged dna
+    in preparation for homologous recombination (see figure 5–48). (a), (b), and (c)
+    were processed 30 minutes after x-irradiation. (from b.e. nelms et al., science
+    280:590– 592, 1998. with permission from aaas.) figure 5–53 chromosome crossing-over
+    occurs in meiosis. meiosis is the process by which a diploid cell gives rise to
+    four haploid germ cells, as described in detail in chapter 17. meiosis produces
+    germ cells in which the paternal and maternal genetic information (red and blue)
+    has been reassorted through chromosome crossovers. in addition, many short regions
+    of gene conversion occur, as indicated.
+  - endometriosis infertility managment hormonal suppression of endometriosis typically
+    has a minimal benefit for endometriosis-related infertility (265). in minimal
+    to mild disease, laparoscopic ablation appears to significantly improve pregnancy
+    rates when compared to diagnostic laparoscopy alone, although there remains some
+    dissent (267,269). one major randomized trial reported 31% versus 17% pregnancy
+    rates over 3 years with a subsequent meta-analysis supporting these findings (265,266,270,271).
+    although authors have estimated that eight laparoscopies involving treatment of
+    mild or minimal endometriosis would need to be performed for each pregnancy gained,
+    that number is likely to be much higher given that not everyone who undergoes
+    laparoscopy will have endometriosis (267,270). the benefit of surgical management
+    of endometriosis is even less clear for moderate to severe disease, although removal
+    of endometriomas may be indicated prior to ivf when they would interfere with
+    oocyte
+  - proportional to disease extent at the time androgen depletion is first started,
+    whereas the degree of psa decline at 6 months has been shown to be prognostic.
+    in a large-scale trial, psa nadir proved prognostic.
+- source_sentence: side effects of pamidronate and zoledronate infusions
+  sentences:
+  - to allow an early diagnosis of pancreatic cancer. despite the fact that many tumor
+    markers such as ca19-9 have been studied, there are still no effective screening
+    tests for pancreatic cancer. research tak-ing advantage of recent advances in
+    genomics, gene expression analysis, and proteomics has demonstrated thousands
+    of genes and corresponding proteins that are differentially expressed in pancreatic
+    tumors that have potential for early detection of pan-creatic cancer.316 some
+    of these proteins would be expected to be expressed at the cell surface or in
+    pancreatic juice and may become useful as biomarkers for pancreatic cancer in
+    the future.in patients presenting with jaundice, a reasonable first diagnostic
+    imaging study is abdominal ultrasound. if bile duct dilation is not seen, hepatocellular
+    disease is likely. demonstra-tion of cholelithiasis and bile duct dilation suggests
+    a diagnosis of choledocholithiasis, and the next logical step would be ercp to
+    clear the bile duct. in the
+  - in general, the prognosis for regular ovulatory cycles and subsequent normal fertility
+    in young women who experience an episode of abnormal bleeding is good, particularly
+    for patients who develop abnormal bleeding as a result of anovulation within the
+    first years after menarche and in whom there are no signs of other specific conditions.
+    some girls, including those in whom there is an underlying medical cause, such
+    as pcos, will continue to have abnormal bleeding into middle and late adolescence
+    and adulthood and will benefit from the ongoing use of oral contraceptives to
+    manage hirsutism, acne, and irregular periods. ovulation induction may ultimately
+    be necessary to achieve fertility in these individuals, although teens should
+    be advised that they should not assume that they are infertile. individuals with
+    coagulopathies may benefit from ongoing oral contraceptive use, use of tranexamic
+    acid, or intranasal desmopressin (99).
+  - pamidronate, 60–90 mg, infused over 2–4 hours, and zoledronate, 4 mg, infused
+    over at least 15 minutes, have been approved for the treatment of hypercalcemia
+    of malignancy and have largely replaced the less effective etidronate for this
+    indication. the bisphosphonate effects generally persist for weeks, but treatment
+    can be repeated after a 7-day interval if necessary and if renal function is not
+    impaired. some patients experience a self-limited flu-like syndrome after the
+    initial infusion, but subsequent infusions generally do not have this side effect.
+    repeated doses of these drugs have been linked to renal deterioration and osteonecrosis
+    of the jaw, but this adverse effect is rare.
+- source_sentence: comparison of erythromycin and azithromycin in terms of cost and
+    tolerability
+  sentences:
+  - alternative agents are erythromycin and azithromycin. azithromycin is more expensive
+    but offers the advantages of better gastrointestinal tolerability, once-daily
+    dosing, and a 5-day treatment course. resistance to erythromycin and other macrolides
+    is common among isolates from several countries, including spain, italy, finland,
+    japan, and korea. macrolide resistance may be becoming more prevalent elsewhere
+    with the increasing use of this class of antibiotics. in areas with resistance
+    rates exceeding 5–10%, macrolides should be avoided unless results of susceptibility
+    testing are known. follow-up culture after treatment is no longer routinely recommended
+    but may be warranted in selected cases, such as those involving patients or families
+    with frequent streptococcal infections or those occurring in situations in which
+    the risk of arf is thought to be high (e.g., when cases of arf have recently been
+    reported in the community).
+  - 50 mm/h. the main diagnostic difficulty in diagnosis arises when the emg performed
+    early in the course of illness shows conduction block that simulates a demyelinating
+    polyneuropathy. nerve biopsy should then settle the issue.
+  - idioventricular rhythms three or more ventricular beats at a rate slower than
+    100 beats/min are termed idioventricular rhythm (fig. 277-1c). automaticity is
+    the likely mechanism. idioventricular rhythms are common during acute mi (chap.
+    295) and may emerge during sinus bradycardia. atropine may be administered to
+    increase the sinus rates if the loss of atrioventricular synchrony leads to hemodynamic
+    compromise. this rhythm is also common in patients with cardiomyopathies or sleep
+    apnea. it can also be idiopathic, often emerging when the sinus rate slows during
+    sleep. therapy should target any underlying cause and correction of bradycardia.
+    specific therapy for asymptomatic idioventricular rhythm is not necessary.
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: SentenceTransformer based on nlpie/tiny-biobert
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: Unknown
+      type: unknown
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.5447024940253692
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.751761750107237
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.8228138979104112
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.893927323978185
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.5447024940253692
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.2505872500357456
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.16456277958208224
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.08939273239781849
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.5447024940253692
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.751761750107237
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.8228138979104112
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.893927323978185
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.7189132708892791
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.6628477055180666
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.6674861654698347
+      name: Cosine Map@100
+---
+# SentenceTransformer based on nlpie/tiny-biobert
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nlpie/tiny-biobert](https://huggingface.co/nlpie/tiny-biobert). It maps sentences & paragraphs to a 312-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [nlpie/tiny-biobert](https://huggingface.co/nlpie/tiny-biobert) <!-- at revision a49b9101d3e9af1f646a43cf3524231a0d1404a1 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 312 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'comparison of erythromycin and azithromycin in terms of cost and tolerability',
+    'alternative agents are erythromycin and azithromycin. azithromycin is more expensive but offers the advantages of better gastrointestinal tolerability, once-daily dosing, and a 5-day treatment course. resistance to erythromycin and other macrolides is common among isolates from several countries, including spain, italy, finland, japan, and korea. macrolide resistance may be becoming more prevalent elsewhere with the increasing use of this class of antibiotics. in areas with resistance rates exceeding 5–10%, macrolides should be avoided unless results of susceptibility testing are known. follow-up culture after treatment is no longer routinely recommended but may be warranted in selected cases, such as those involving patients or families with frequent streptococcal infections or those occurring in situations in which the risk of arf is thought to be high (e.g., when cases of arf have recently been reported in the community).',
+    '50 mm/h. the main diagnostic difficulty in diagnosis arises when the emg performed early in the course of illness shows conduction block that simulates a demyelinating polyneuropathy. nerve biopsy should then settle the issue.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 312]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.5447     |
+| cosine_accuracy@3   | 0.7518     |
+| cosine_accuracy@5   | 0.8228     |
+| cosine_accuracy@10  | 0.8939     |
+| cosine_precision@1  | 0.5447     |
+| cosine_precision@3  | 0.2506     |
+| cosine_precision@5  | 0.1646     |
+| cosine_precision@10 | 0.0894     |
+| cosine_recall@1     | 0.5447     |
+| cosine_recall@3     | 0.7518     |
+| cosine_recall@5     | 0.8228     |
+| cosine_recall@10    | 0.8939     |
+| **cosine_ndcg@10**  | **0.7189** |
+| cosine_mrr@10       | 0.6628     |
+| cosine_map@100      | 0.6675     |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 332,672 training samples
+* Columns: <code>sentence_0</code> and <code>sentence_1</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                       | sentence_1                                                                          |
+  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                              |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 15.8 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 185.45 tokens</li><li>max: 446 tokens</li></ul> |
+* Samples:
+  | sentence_0                                                                | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+  |:--------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>watershed area capillaries in spinal cord</code>                    | <code>the posterior medullary arteries form the paired posterior spinal arteries that supply the dorsal third of the cord by means of direct penetrating vessels and a plexus of pial vessels (similar to that of the ventral cord, with which it anastomoses freely). within the cord substance, then, there is a “watershed” area of capillaries where the penetrating branches of the anterior spinal artery meet the penetrating branches of the posterior spinal arteries and the branches of the circumferential pial network. all spinal segments, because of the variable size of collateral arteries, do not have the same abundance of circulatory protection.</code>                                                                                                                                                                                   |
+  | <code>sulfa drugs allergy symptoms and reactions</code>                   | <code>drugs of abuse (eg, amphetamines, cocaine, drugs of abuse (eg, heroin/opioids) lsd), meperidine sympathomimetics parasympathomimetics (eg, pilocarpine), organophosphates sulfa drugs sulfonamide antibiotics, sulfasalazine, scary sulfa pharm facts probenecid, furosemide, acetazolamide, celecoxib, thiazides, sulfonylureas. patients with sulfa allergies may develop fever, urinary tract infection, stevens-johnson syndrome, hemolytic anemia, thrombocytopenia, agranulocytosis, acute interstitial nephritis, and urticaria (hives). “medicine is a science of uncertainty and an art of probability.” “there are two kinds of statistics: the kind you look up and the kind you make up.” “on a long enough timeline, the survival rate for everyone drops to zero.” “there are three kinds of lies: lies, damned lies, and statistics.”</code> |
+  | <code>hla genotype association with type 1 diabetes susceptibility</code> | <code>fig. 15.38 population studies show association of susceptibility to type 1 diabetes with hla genotype. the hla genotypes (determined by serotyping) of patients with diabetes (lower panel) are not representative of those found in the general population (upper panel). almost all patients with diabetes express hla‑dr3 and/or hla‑dr4, and hla‑dr3/dr4 heterozygosity is greatly overrepresented in diabetics compared with controls. these alleles are linked tightly to hla‑dq alleles that confer susceptibility to type 1 diabetes. by contrast, hla‑dr2 protects against the development of diabetes and is found only extremely rarely in patients with diabetes. the small letter x represents any allele other than dr2, dr3, or dr4. family studies of hla haplotypes in type 1 diabetes</code>                                              |
+* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim",
+      "mini_batch_size": 32
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `num_train_epochs`: 5
+- `fp16`: True
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 5
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `tp_size`: 0
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+</details>
+### Training Logs
+| Epoch  | Step  | Training Loss | cosine_ndcg@10 |
+|:------:|:-----:|:-------------:|:--------------:|
+| 0.0962 | 500   | 1.8469        | -              |
+| 0.1924 | 1000  | 0.301         | 0.5635         |
+| 0.2886 | 1500  | 0.2186        | -              |
+| 0.3848 | 2000  | 0.1915        | 0.6145         |
+| 0.4810 | 2500  | 0.1615        | -              |
+| 0.5771 | 3000  | 0.1504        | 0.6395         |
+| 0.6733 | 3500  | 0.1451        | -              |
+| 0.7695 | 4000  | 0.1365        | 0.6568         |
+| 0.8657 | 4500  | 0.1247        | -              |
+| 0.9619 | 5000  | 0.126         | 0.6666         |
+| 1.0    | 5198  | -             | 0.6692         |
+| 1.0581 | 5500  | 0.1102        | -              |
+| 1.1543 | 6000  | 0.1075        | 0.6740         |
+| 1.2505 | 6500  | 0.1025        | -              |
+| 1.3467 | 7000  | 0.1011        | 0.6782         |
+| 1.4429 | 7500  | 0.099         | -              |
+| 1.5391 | 8000  | 0.0961        | 0.6903         |
+| 1.6352 | 8500  | 0.0902        | -              |
+| 1.7314 | 9000  | 0.0914        | 0.6915         |
+| 1.8276 | 9500  | 0.0894        | -              |
+| 1.9238 | 10000 | 0.0881        | 0.6972         |
+| 2.0    | 10396 | -             | 0.7002         |
+| 2.0200 | 10500 | 0.0848        | -              |
+| 2.1162 | 11000 | 0.0779        | 0.7008         |
+| 2.2124 | 11500 | 0.0756        | -              |
+| 2.3086 | 12000 | 0.075         | 0.7016         |
+| 2.4048 | 12500 | 0.0785        | -              |
+| 2.5010 | 13000 | 0.0744        | 0.7027         |
+| 2.5972 | 13500 | 0.0739        | -              |
+| 2.6933 | 14000 | 0.0741        | 0.7077         |
+| 2.7895 | 14500 | 0.0704        | -              |
+| 2.8857 | 15000 | 0.074         | 0.7097         |
+| 2.9819 | 15500 | 0.0696        | -              |
+| 3.0    | 15594 | -             | 0.7127         |
+| 3.0781 | 16000 | 0.0663        | 0.7135         |
+| 3.1743 | 16500 | 0.0656        | -              |
+| 3.2705 | 17000 | 0.0634        | 0.7122         |
+| 3.3667 | 17500 | 0.0639        | -              |
+| 3.4629 | 18000 | 0.0657        | 0.7159         |
+| 3.5591 | 18500 | 0.0658        | -              |
+| 3.6553 | 19000 | 0.0627        | 0.7170         |
+| 3.7514 | 19500 | 0.0648        | -              |
+| 3.8476 | 20000 | 0.0638        | 0.7166         |
+| 3.9438 | 20500 | 0.0613        | -              |
+| 4.0    | 20792 | -             | 0.7182         |
+| 4.0400 | 21000 | 0.061         | 0.7171         |
+| 4.1362 | 21500 | 0.0583        | -              |
+| 4.2324 | 22000 | 0.0602        | 0.7178         |
+| 4.3286 | 22500 | 0.0599        | -              |
+| 4.4248 | 23000 | 0.0579        | 0.7185         |
+| 4.5210 | 23500 | 0.0586        | -              |
+| 4.6172 | 24000 | 0.061         | 0.7181         |
+| 4.7134 | 24500 | 0.0591        | -              |
+| 4.8095 | 25000 | 0.0568        | 0.7189         |
+| 4.9057 | 25500 | 0.057         | -              |
+### Framework Versions
+- Python: 3.11.11
+- Sentence Transformers: 4.1.0
+- Transformers: 4.51.1
+- PyTorch: 2.5.1+cu124
+- Accelerate: 1.3.0
+- Datasets: 3.5.0
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### CachedMultipleNegativesRankingLoss
+```bibtex
+@misc{gao2021scaling,
+    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
+    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
+    year={2021},
+    eprint={2101.06983},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

biencoder-checkpoints/checkpoint-tinybiobert/config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "adapters": {
+    "adapters": {},
+    "config_map": {},
+    "fusion_config_map": {},
+    "fusions": {}
+  },
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "cell": {},
+  "classifier_dropout": null,
+  "emb_size": 312,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 312,
+  "initializer_range": 0.02,
+  "intermediate_size": 1200,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 4,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "pre_trained": "",
+  "structure": [],
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.1",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 28996
+}

biencoder-checkpoints/checkpoint-tinybiobert/config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "4.1.0",
+    "transformers": "4.51.1",
+    "pytorch": "2.5.1+cu124"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

biencoder-checkpoints/checkpoint-tinybiobert/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5f745c82f787b597535ef55de93eb76aca5f6b4e78d903ec53201809f82da99
+size 55504328

biencoder-checkpoints/checkpoint-tinybiobert/modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

biencoder-checkpoints/checkpoint-tinybiobert/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:758791b468924176dbc0c0cf74d95b3fd8358e802d4783f599c1882fedafdb93
+size 14244

biencoder-checkpoints/checkpoint-tinybiobert/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3076546cf27c1d08da2eadd3cb8ac8adfe684c0e523d3df7772a350641740fef
+size 988

biencoder-checkpoints/checkpoint-tinybiobert/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6c588a45dcdcf37531cdcf6d4fba104a3658213b98d027a148ce4cef470750e
+size 1064

biencoder-checkpoints/checkpoint-tinybiobert/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

biencoder-checkpoints/checkpoint-tinybiobert/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

biencoder-checkpoints/checkpoint-tinybiobert/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

biencoder-checkpoints/checkpoint-tinybiobert/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

biencoder-checkpoints/checkpoint-tinybiobert/trainer_state.json ADDED Viewed

	@@ -0,0 +1,941 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.0,
+  "eval_steps": 1000,
+  "global_step": 25990,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.09619084263178146,
+      "grad_norm": 5.741635322570801,
+      "learning_rate": 9.980000000000001e-06,
+      "loss": 1.8469,
+      "step": 500
+    },
+    {
+      "epoch": 0.19238168526356292,
+      "grad_norm": 5.087311744689941,
+      "learning_rate": 1.9980000000000002e-05,
+      "loss": 0.301,
+      "step": 1000
+    },
+    {
+      "epoch": 0.19238168526356292,
+      "eval_cosine_accuracy@1": 0.4003615417611373,
+      "eval_cosine_accuracy@10": 0.7401495189656229,
+      "eval_cosine_accuracy@3": 0.5782523438936209,
+      "eval_cosine_accuracy@5": 0.6526441571174705,
+      "eval_cosine_map@100": 0.5158702186798423,
+      "eval_cosine_mrr@10": 0.5077505697419804,
+      "eval_cosine_ndcg@10": 0.5635344391210978,
+      "eval_cosine_precision@1": 0.4003615417611373,
+      "eval_cosine_precision@10": 0.07401495189656228,
+      "eval_cosine_precision@3": 0.19275078129787365,
+      "eval_cosine_precision@5": 0.1305288314234941,
+      "eval_cosine_recall@1": 0.4003615417611373,
+      "eval_cosine_recall@10": 0.7401495189656229,
+      "eval_cosine_recall@3": 0.5782523438936209,
+      "eval_cosine_recall@5": 0.6526441571174705,
+      "eval_runtime": 37.6057,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 1000
+    },
+    {
+      "epoch": 0.2885725278953444,
+      "grad_norm": 2.9921863079071045,
+      "learning_rate": 1.9600640256102444e-05,
+      "loss": 0.2186,
+      "step": 1500
+    },
+    {
+      "epoch": 0.38476337052712584,
+      "grad_norm": 4.068124771118164,
+      "learning_rate": 1.9200480192076832e-05,
+      "loss": 0.1915,
+      "step": 2000
+    },
+    {
+      "epoch": 0.38476337052712584,
+      "eval_cosine_accuracy@1": 0.44365463570071695,
+      "eval_cosine_accuracy@10": 0.7951161223114162,
+      "eval_cosine_accuracy@3": 0.6363441387339911,
+      "eval_cosine_accuracy@5": 0.7084380170353576,
+      "eval_cosine_map@100": 0.5644053261135765,
+      "eval_cosine_mrr@10": 0.5572720521507379,
+      "eval_cosine_ndcg@10": 0.6145320896671304,
+      "eval_cosine_precision@1": 0.44365463570071695,
+      "eval_cosine_precision@10": 0.07951161223114162,
+      "eval_cosine_precision@3": 0.21211471291133036,
+      "eval_cosine_precision@5": 0.1416876034070715,
+      "eval_cosine_recall@1": 0.44365463570071695,
+      "eval_cosine_recall@10": 0.7951161223114162,
+      "eval_cosine_recall@3": 0.6363441387339911,
+      "eval_cosine_recall@5": 0.7084380170353576,
+      "eval_runtime": 37.3288,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 2000
+    },
+    {
+      "epoch": 0.4809542131589073,
+      "grad_norm": 3.5689873695373535,
+      "learning_rate": 1.880032012805122e-05,
+      "loss": 0.1615,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5771450557906888,
+      "grad_norm": 1.5700035095214844,
+      "learning_rate": 1.8400160064025612e-05,
+      "loss": 0.1504,
+      "step": 3000
+    },
+    {
+      "epoch": 0.5771450557906888,
+      "eval_cosine_accuracy@1": 0.46583736748575283,
+      "eval_cosine_accuracy@10": 0.8197193455481341,
+      "eval_cosine_accuracy@3": 0.6656045100802745,
+      "eval_cosine_accuracy@5": 0.7403946320240211,
+      "eval_cosine_map@100": 0.5887217637289748,
+      "eval_cosine_mrr@10": 0.5820531404138076,
+      "eval_cosine_ndcg@10": 0.6394519654007175,
+      "eval_cosine_precision@1": 0.46583736748575283,
+      "eval_cosine_precision@10": 0.0819719345548134,
+      "eval_cosine_precision@3": 0.22186817002675815,
+      "eval_cosine_precision@5": 0.14807892640480422,
+      "eval_cosine_recall@1": 0.46583736748575283,
+      "eval_cosine_recall@10": 0.8197193455481341,
+      "eval_cosine_recall@3": 0.6656045100802745,
+      "eval_cosine_recall@5": 0.7403946320240211,
+      "eval_runtime": 37.3165,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6733358984224702,
+      "grad_norm": 2.3708999156951904,
+      "learning_rate": 1.8e-05,
+      "loss": 0.1451,
+      "step": 3500
+    },
+    {
+      "epoch": 0.7695267410542517,
+      "grad_norm": 2.4959521293640137,
+      "learning_rate": 1.7599839935974392e-05,
+      "loss": 0.1365,
+      "step": 4000
+    },
+    {
+      "epoch": 0.7695267410542517,
+      "eval_cosine_accuracy@1": 0.4831791163674245,
+      "eval_cosine_accuracy@10": 0.8363870335192107,
+      "eval_cosine_accuracy@3": 0.6838654329309394,
+      "eval_cosine_accuracy@5": 0.7581346896255898,
+      "eval_cosine_map@100": 0.6058547996468352,
+      "eval_cosine_mrr@10": 0.5995932655187352,
+      "eval_cosine_ndcg@10": 0.6568315628791503,
+      "eval_cosine_precision@1": 0.4831791163674245,
+      "eval_cosine_precision@10": 0.08363870335192108,
+      "eval_cosine_precision@3": 0.2279551443103131,
+      "eval_cosine_precision@5": 0.15162693792511794,
+      "eval_cosine_recall@1": 0.4831791163674245,
+      "eval_cosine_recall@10": 0.8363870335192107,
+      "eval_cosine_recall@3": 0.6838654329309394,
+      "eval_cosine_recall@5": 0.7581346896255898,
+      "eval_runtime": 37.9907,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 4000
+    },
+    {
+      "epoch": 0.8657175836860331,
+      "grad_norm": 3.29150652885437,
+      "learning_rate": 1.720048019207683e-05,
+      "loss": 0.1247,
+      "step": 4500
+    },
+    {
+      "epoch": 0.9619084263178146,
+      "grad_norm": 1.8889851570129395,
+      "learning_rate": 1.6800320128051223e-05,
+      "loss": 0.126,
+      "step": 5000
+    },
+    {
+      "epoch": 0.9619084263178146,
+      "eval_cosine_accuracy@1": 0.49175807341136096,
+      "eval_cosine_accuracy@10": 0.846896255898033,
+      "eval_cosine_accuracy@3": 0.6944972118389607,
+      "eval_cosine_accuracy@5": 0.7665604510080275,
+      "eval_cosine_map@100": 0.6151092171416574,
+      "eval_cosine_mrr@10": 0.6091163941729371,
+      "eval_cosine_ndcg@10": 0.6665806678949275,
+      "eval_cosine_precision@1": 0.49175807341136096,
+      "eval_cosine_precision@10": 0.0846896255898033,
+      "eval_cosine_precision@3": 0.23149907061298688,
+      "eval_cosine_precision@5": 0.15331209020160547,
+      "eval_cosine_recall@1": 0.49175807341136096,
+      "eval_cosine_recall@10": 0.846896255898033,
+      "eval_cosine_recall@3": 0.6944972118389607,
+      "eval_cosine_recall@5": 0.7665604510080275,
+      "eval_runtime": 37.3193,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 5000
+    },
+    {
+      "epoch": 1.058099268949596,
+      "grad_norm": 2.8940165042877197,
+      "learning_rate": 1.640016006402561e-05,
+      "loss": 0.1102,
+      "step": 5500
+    },
+    {
+      "epoch": 1.1542901115813775,
+      "grad_norm": 3.093466281890869,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 0.1075,
+      "step": 6000
+    },
+    {
+      "epoch": 1.1542901115813775,
+      "eval_cosine_accuracy@1": 0.49788589987131565,
+      "eval_cosine_accuracy@10": 0.8543109259145781,
+      "eval_cosine_accuracy@3": 0.7028004166921993,
+      "eval_cosine_accuracy@5": 0.7760279428886574,
+      "eval_cosine_map@100": 0.6222303929011103,
+      "eval_cosine_mrr@10": 0.6164308912486005,
+      "eval_cosine_ndcg@10": 0.6739777946779848,
+      "eval_cosine_precision@1": 0.49788589987131565,
+      "eval_cosine_precision@10": 0.08543109259145781,
+      "eval_cosine_precision@3": 0.2342668055640664,
+      "eval_cosine_precision@5": 0.15520558857773148,
+      "eval_cosine_recall@1": 0.49788589987131565,
+      "eval_cosine_recall@10": 0.8543109259145781,
+      "eval_cosine_recall@3": 0.7028004166921993,
+      "eval_cosine_recall@5": 0.7760279428886574,
+      "eval_runtime": 38.0227,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 6000
+    },
+    {
+      "epoch": 1.250480954213159,
+      "grad_norm": 1.681726098060608,
+      "learning_rate": 1.560064025610244e-05,
+      "loss": 0.1025,
+      "step": 6500
+    },
+    {
+      "epoch": 1.3466717968449404,
+      "grad_norm": 1.8751877546310425,
+      "learning_rate": 1.5200480192076832e-05,
+      "loss": 0.1011,
+      "step": 7000
+    },
+    {
+      "epoch": 1.3466717968449404,
+      "eval_cosine_accuracy@1": 0.5032477480237759,
+      "eval_cosine_accuracy@10": 0.8573442000122556,
+      "eval_cosine_accuracy@3": 0.707702677860163,
+      "eval_cosine_accuracy@5": 0.7806544518659232,
+      "eval_cosine_map@100": 0.6268520263379107,
+      "eval_cosine_mrr@10": 0.6210327794945518,
+      "eval_cosine_ndcg@10": 0.6782188559387655,
+      "eval_cosine_precision@1": 0.5032477480237759,
+      "eval_cosine_precision@10": 0.08573442000122555,
+      "eval_cosine_precision@3": 0.2359008926200543,
+      "eval_cosine_precision@5": 0.15613089037318464,
+      "eval_cosine_recall@1": 0.5032477480237759,
+      "eval_cosine_recall@10": 0.8573442000122556,
+      "eval_cosine_recall@3": 0.707702677860163,
+      "eval_cosine_recall@5": 0.7806544518659232,
+      "eval_runtime": 37.2837,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 7000
+    },
+    {
+      "epoch": 1.4428626394767219,
+      "grad_norm": 1.1830039024353027,
+      "learning_rate": 1.4800320128051222e-05,
+      "loss": 0.099,
+      "step": 7500
+    },
+    {
+      "epoch": 1.5390534821085033,
+      "grad_norm": 2.8491501808166504,
+      "learning_rate": 1.4400160064025611e-05,
+      "loss": 0.0961,
+      "step": 8000
+    },
+    {
+      "epoch": 1.5390534821085033,
+      "eval_cosine_accuracy@1": 0.5155034009436853,
+      "eval_cosine_accuracy@10": 0.8688338746246707,
+      "eval_cosine_accuracy@3": 0.7188246828849807,
+      "eval_cosine_accuracy@5": 0.7922360438752375,
+      "eval_cosine_map@100": 0.6387374052710177,
+      "eval_cosine_mrr@10": 0.6333171505217987,
+      "eval_cosine_ndcg@10": 0.6903269438115052,
+      "eval_cosine_precision@1": 0.5155034009436853,
+      "eval_cosine_precision@10": 0.08688338746246706,
+      "eval_cosine_precision@3": 0.23960822762832687,
+      "eval_cosine_precision@5": 0.1584472087750475,
+      "eval_cosine_recall@1": 0.5155034009436853,
+      "eval_cosine_recall@10": 0.8688338746246707,
+      "eval_cosine_recall@3": 0.7188246828849807,
+      "eval_cosine_recall@5": 0.7922360438752375,
+      "eval_runtime": 37.1861,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 8000
+    },
+    {
+      "epoch": 1.6352443247402846,
+      "grad_norm": 2.94862699508667,
+      "learning_rate": 1.4000800320128052e-05,
+      "loss": 0.0902,
+      "step": 8500
+    },
+    {
+      "epoch": 1.7314351673720663,
+      "grad_norm": 2.007293462753296,
+      "learning_rate": 1.3600640256102442e-05,
+      "loss": 0.0914,
+      "step": 9000
+    },
+    {
+      "epoch": 1.7314351673720663,
+      "eval_cosine_accuracy@1": 0.5171272749555733,
+      "eval_cosine_accuracy@10": 0.8689257920215699,
+      "eval_cosine_accuracy@3": 0.7210000612782645,
+      "eval_cosine_accuracy@5": 0.7930939395796311,
+      "eval_cosine_map@100": 0.6402035789072501,
+      "eval_cosine_mrr@10": 0.6347871236858099,
+      "eval_cosine_ndcg@10": 0.6914745561225969,
+      "eval_cosine_precision@1": 0.5171272749555733,
+      "eval_cosine_precision@10": 0.086892579202157,
+      "eval_cosine_precision@3": 0.2403333537594215,
+      "eval_cosine_precision@5": 0.15861878791592623,
+      "eval_cosine_recall@1": 0.5171272749555733,
+      "eval_cosine_recall@10": 0.8689257920215699,
+      "eval_cosine_recall@3": 0.7210000612782645,
+      "eval_cosine_recall@5": 0.7930939395796311,
+      "eval_runtime": 37.3089,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 9000
+    },
+    {
+      "epoch": 1.8276260100038475,
+      "grad_norm": 1.6253125667572021,
+      "learning_rate": 1.3200480192076832e-05,
+      "loss": 0.0894,
+      "step": 9500
+    },
+    {
+      "epoch": 1.9238168526356292,
+      "grad_norm": 2.037698984146118,
+      "learning_rate": 1.2800320128051222e-05,
+      "loss": 0.0881,
+      "step": 10000
+    },
+    {
+      "epoch": 1.9238168526356292,
+      "eval_cosine_accuracy@1": 0.5226729579018322,
+      "eval_cosine_accuracy@10": 0.8739812488510326,
+      "eval_cosine_accuracy@3": 0.7287211226178074,
+      "eval_cosine_accuracy@5": 0.8007230835222746,
+      "eval_cosine_map@100": 0.6459750840989936,
+      "eval_cosine_mrr@10": 0.64070738218282,
+      "eval_cosine_ndcg@10": 0.6972440512806575,
+      "eval_cosine_precision@1": 0.5226729579018322,
+      "eval_cosine_precision@10": 0.08739812488510325,
+      "eval_cosine_precision@3": 0.24290704087260248,
+      "eval_cosine_precision@5": 0.16014461670445493,
+      "eval_cosine_recall@1": 0.5226729579018322,
+      "eval_cosine_recall@10": 0.8739812488510326,
+      "eval_cosine_recall@3": 0.7287211226178074,
+      "eval_cosine_recall@5": 0.8007230835222746,
+      "eval_runtime": 37.9487,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 10000
+    },
+    {
+      "epoch": 2.0200076952674104,
+      "grad_norm": 1.377611517906189,
+      "learning_rate": 1.2400960384153663e-05,
+      "loss": 0.0848,
+      "step": 10500
+    },
+    {
+      "epoch": 2.116198537899192,
+      "grad_norm": 1.8393025398254395,
+      "learning_rate": 1.2000800320128053e-05,
+      "loss": 0.0779,
+      "step": 11000
+    },
+    {
+      "epoch": 2.116198537899192,
+      "eval_cosine_accuracy@1": 0.5261964581163061,
+      "eval_cosine_accuracy@10": 0.8776885838593051,
+      "eval_cosine_accuracy@3": 0.7318156749800846,
+      "eval_cosine_accuracy@5": 0.8031435749739567,
+      "eval_cosine_map@100": 0.6493668761542414,
+      "eval_cosine_mrr@10": 0.64421845652697,
+      "eval_cosine_ndcg@10": 0.7007940310013228,
+      "eval_cosine_precision@1": 0.5261964581163061,
+      "eval_cosine_precision@10": 0.0877688583859305,
+      "eval_cosine_precision@3": 0.24393855832669484,
+      "eval_cosine_precision@5": 0.16062871499479137,
+      "eval_cosine_recall@1": 0.5261964581163061,
+      "eval_cosine_recall@10": 0.8776885838593051,
+      "eval_cosine_recall@3": 0.7318156749800846,
+      "eval_cosine_recall@5": 0.8031435749739567,
+      "eval_runtime": 37.6496,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 11000
+    },
+    {
+      "epoch": 2.2123893805309733,
+      "grad_norm": 2.0715949535369873,
+      "learning_rate": 1.1600640256102443e-05,
+      "loss": 0.0756,
+      "step": 11500
+    },
+    {
+      "epoch": 2.308580223162755,
+      "grad_norm": 3.181828498840332,
+      "learning_rate": 1.1200480192076833e-05,
+      "loss": 0.075,
+      "step": 12000
+    },
+    {
+      "epoch": 2.308580223162755,
+      "eval_cosine_accuracy@1": 0.5265028494393039,
+      "eval_cosine_accuracy@10": 0.8780256143146026,
+      "eval_cosine_accuracy@3": 0.7324897358906796,
+      "eval_cosine_accuracy@5": 0.8056253446902384,
+      "eval_cosine_map@100": 0.6502720334254667,
+      "eval_cosine_mrr@10": 0.6450791729768786,
+      "eval_cosine_ndcg@10": 0.7015823044400388,
+      "eval_cosine_precision@1": 0.5265028494393039,
+      "eval_cosine_precision@10": 0.08780256143146026,
+      "eval_cosine_precision@3": 0.24416324529689318,
+      "eval_cosine_precision@5": 0.16112506893804765,
+      "eval_cosine_recall@1": 0.5265028494393039,
+      "eval_cosine_recall@10": 0.8780256143146026,
+      "eval_cosine_recall@3": 0.7324897358906796,
+      "eval_cosine_recall@5": 0.8056253446902384,
+      "eval_runtime": 38.432,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 12000
+    },
+    {
+      "epoch": 2.4047710657945363,
+      "grad_norm": 2.3642358779907227,
+      "learning_rate": 1.0801120448179271e-05,
+      "loss": 0.0785,
+      "step": 12500
+    },
+    {
+      "epoch": 2.500961908426318,
+      "grad_norm": 1.315789818763733,
+      "learning_rate": 1.0400960384153661e-05,
+      "loss": 0.0744,
+      "step": 13000
+    },
+    {
+      "epoch": 2.500961908426318,
+      "eval_cosine_accuracy@1": 0.5271769103498989,
+      "eval_cosine_accuracy@10": 0.880415466633985,
+      "eval_cosine_accuracy@3": 0.7335314663888719,
+      "eval_cosine_accuracy@5": 0.8069121882468289,
+      "eval_cosine_map@100": 0.6508831760712028,
+      "eval_cosine_mrr@10": 0.6458145729440025,
+      "eval_cosine_ndcg@10": 0.7026735805144717,
+      "eval_cosine_precision@1": 0.5271769103498989,
+      "eval_cosine_precision@10": 0.08804154666339849,
+      "eval_cosine_precision@3": 0.2445104887962906,
+      "eval_cosine_precision@5": 0.16138243764936577,
+      "eval_cosine_recall@1": 0.5271769103498989,
+      "eval_cosine_recall@10": 0.880415466633985,
+      "eval_cosine_recall@3": 0.7335314663888719,
+      "eval_cosine_recall@5": 0.8069121882468289,
+      "eval_runtime": 37.3573,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 13000
+    },
+    {
+      "epoch": 2.597152751058099,
+      "grad_norm": 2.1482696533203125,
+      "learning_rate": 1.0000800320128053e-05,
+      "loss": 0.0739,
+      "step": 13500
+    },
+    {
+      "epoch": 2.693343593689881,
+      "grad_norm": 1.6254030466079712,
+      "learning_rate": 9.600640256102441e-06,
+      "loss": 0.0741,
+      "step": 14000
+    },
+    {
+      "epoch": 2.693343593689881,
+      "eval_cosine_accuracy@1": 0.5324468411054599,
+      "eval_cosine_accuracy@10": 0.8847968625528525,
+      "eval_cosine_accuracy@3": 0.7399044059072247,
+      "eval_cosine_accuracy@5": 0.811079110239598,
+      "eval_cosine_map@100": 0.6559858785481972,
+      "eval_cosine_mrr@10": 0.6510460710419446,
+      "eval_cosine_ndcg@10": 0.7077137677382884,
+      "eval_cosine_precision@1": 0.5324468411054599,
+      "eval_cosine_precision@10": 0.08847968625528524,
+      "eval_cosine_precision@3": 0.2466348019690749,
+      "eval_cosine_precision@5": 0.16221582204791962,
+      "eval_cosine_recall@1": 0.5324468411054599,
+      "eval_cosine_recall@10": 0.8847968625528525,
+      "eval_cosine_recall@3": 0.7399044059072247,
+      "eval_cosine_recall@5": 0.811079110239598,
+      "eval_runtime": 37.4282,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 14000
+    },
+    {
+      "epoch": 2.789534436321662,
+      "grad_norm": 1.0776854753494263,
+      "learning_rate": 9.201280512204884e-06,
+      "loss": 0.0704,
+      "step": 14500
+    },
+    {
+      "epoch": 2.8857252789534438,
+      "grad_norm": 2.2815675735473633,
+      "learning_rate": 8.801120448179272e-06,
+      "loss": 0.074,
+      "step": 15000
+    },
+    {
+      "epoch": 2.8857252789534438,
+      "eval_cosine_accuracy@1": 0.5350511673509406,
+      "eval_cosine_accuracy@10": 0.8862369017709418,
+      "eval_cosine_accuracy@3": 0.7405784668178197,
+      "eval_cosine_accuracy@5": 0.813101292971383,
+      "eval_cosine_map@100": 0.658100408637169,
+      "eval_cosine_mrr@10": 0.6532255589696401,
+      "eval_cosine_ndcg@10": 0.7097064138010793,
+      "eval_cosine_precision@1": 0.5350511673509406,
+      "eval_cosine_precision@10": 0.08862369017709419,
+      "eval_cosine_precision@3": 0.24685948893927323,
+      "eval_cosine_precision@5": 0.16262025859427662,
+      "eval_cosine_recall@1": 0.5350511673509406,
+      "eval_cosine_recall@10": 0.8862369017709418,
+      "eval_cosine_recall@3": 0.7405784668178197,
+      "eval_cosine_recall@5": 0.813101292971383,
+      "eval_runtime": 37.393,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 15000
+    },
+    {
+      "epoch": 2.981916121585225,
+      "grad_norm": 1.556275486946106,
+      "learning_rate": 8.400960384153662e-06,
+      "loss": 0.0696,
+      "step": 15500
+    },
+    {
+      "epoch": 3.0781069642170067,
+      "grad_norm": 2.5441489219665527,
+      "learning_rate": 8.000800320128052e-06,
+      "loss": 0.0663,
+      "step": 16000
+    },
+    {
+      "epoch": 3.0781069642170067,
+      "eval_cosine_accuracy@1": 0.5396163980636068,
+      "eval_cosine_accuracy@10": 0.8885654758257246,
+      "eval_cosine_accuracy@3": 0.7458790367056805,
+      "eval_cosine_accuracy@5": 0.8164409583920583,
+      "eval_cosine_map@100": 0.6622726196460629,
+      "eval_cosine_mrr@10": 0.6574670605011067,
+      "eval_cosine_ndcg@10": 0.7135277681055804,
+      "eval_cosine_precision@1": 0.5396163980636068,
+      "eval_cosine_precision@10": 0.08885654758257244,
+      "eval_cosine_precision@3": 0.24862634556856014,
+      "eval_cosine_precision@5": 0.1632881916784117,
+      "eval_cosine_recall@1": 0.5396163980636068,
+      "eval_cosine_recall@10": 0.8885654758257246,
+      "eval_cosine_recall@3": 0.7458790367056805,
+      "eval_cosine_recall@5": 0.8164409583920583,
+      "eval_runtime": 37.3857,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 16000
+    },
+    {
+      "epoch": 3.174297806848788,
+      "grad_norm": 0.8971004486083984,
+      "learning_rate": 7.600640256102442e-06,
+      "loss": 0.0656,
+      "step": 16500
+    },
+    {
+      "epoch": 3.2704886494805696,
+      "grad_norm": 1.499013900756836,
+      "learning_rate": 7.2012805122048825e-06,
+      "loss": 0.0634,
+      "step": 17000
+    },
+    {
+      "epoch": 3.2704886494805696,
+      "eval_cosine_accuracy@1": 0.537441019670323,
+      "eval_cosine_accuracy@10": 0.8884122801642258,
+      "eval_cosine_accuracy@3": 0.7433972669893989,
+      "eval_cosine_accuracy@5": 0.8161039279367608,
+      "eval_cosine_map@100": 0.6605980760721326,
+      "eval_cosine_mrr@10": 0.6558092645927442,
+      "eval_cosine_ndcg@10": 0.7122137222786563,
+      "eval_cosine_precision@1": 0.537441019670323,
+      "eval_cosine_precision@10": 0.08884122801642257,
+      "eval_cosine_precision@3": 0.24779908899646627,
+      "eval_cosine_precision@5": 0.1632207855873522,
+      "eval_cosine_recall@1": 0.537441019670323,
+      "eval_cosine_recall@10": 0.8884122801642258,
+      "eval_cosine_recall@3": 0.7433972669893989,
+      "eval_cosine_recall@5": 0.8161039279367608,
+      "eval_runtime": 37.408,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 17000
+    },
+    {
+      "epoch": 3.366679492112351,
+      "grad_norm": 0.85924232006073,
+      "learning_rate": 6.8011204481792725e-06,
+      "loss": 0.0639,
+      "step": 17500
+    },
+    {
+      "epoch": 3.4628703347441325,
+      "grad_norm": 1.6038638353347778,
+      "learning_rate": 6.400960384153662e-06,
+      "loss": 0.0657,
+      "step": 18000
+    },
+    {
+      "epoch": 3.4628703347441325,
+      "eval_cosine_accuracy@1": 0.5418224155891905,
+      "eval_cosine_accuracy@10": 0.8918132238495006,
+      "eval_cosine_accuracy@3": 0.7472271585268705,
+      "eval_cosine_accuracy@5": 0.8188308107114407,
+      "eval_cosine_map@100": 0.6642658993896512,
+      "eval_cosine_mrr@10": 0.6595858780834957,
+      "eval_cosine_ndcg@10": 0.7158873591426906,
+      "eval_cosine_precision@1": 0.5418224155891905,
+      "eval_cosine_precision@10": 0.08918132238495005,
+      "eval_cosine_precision@3": 0.2490757195089568,
+      "eval_cosine_precision@5": 0.16376616214228812,
+      "eval_cosine_recall@1": 0.5418224155891905,
+      "eval_cosine_recall@10": 0.8918132238495006,
+      "eval_cosine_recall@3": 0.7472271585268705,
+      "eval_cosine_recall@5": 0.8188308107114407,
+      "eval_runtime": 38.4035,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 18000
+    },
+    {
+      "epoch": 3.5590611773759138,
+      "grad_norm": 1.283136010169983,
+      "learning_rate": 6.000800320128052e-06,
+      "loss": 0.0658,
+      "step": 18500
+    },
+    {
+      "epoch": 3.655252020007695,
+      "grad_norm": 0.8852151036262512,
+      "learning_rate": 5.600640256102441e-06,
+      "loss": 0.0627,
+      "step": 19000
+    },
+    {
+      "epoch": 3.655252020007695,
+      "eval_cosine_accuracy@1": 0.5437220417917764,
+      "eval_cosine_accuracy@10": 0.8912617194681046,
+      "eval_cosine_accuracy@3": 0.7493106195232551,
+      "eval_cosine_accuracy@5": 0.820393406458729,
+      "eval_cosine_map@100": 0.6659324483571365,
+      "eval_cosine_mrr@10": 0.661192231861396,
+      "eval_cosine_ndcg@10": 0.7170066447079718,
+      "eval_cosine_precision@1": 0.5437220417917764,
+      "eval_cosine_precision@10": 0.08912617194681045,
+      "eval_cosine_precision@3": 0.24977020650775167,
+      "eval_cosine_precision@5": 0.16407868129174585,
+      "eval_cosine_recall@1": 0.5437220417917764,
+      "eval_cosine_recall@10": 0.8912617194681046,
+      "eval_cosine_recall@3": 0.7493106195232551,
+      "eval_cosine_recall@5": 0.820393406458729,
+      "eval_runtime": 37.4746,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 19000
+    },
+    {
+      "epoch": 3.7514428626394767,
+      "grad_norm": 2.4619545936584473,
+      "learning_rate": 5.200480192076831e-06,
+      "loss": 0.0648,
+      "step": 19500
+    },
+    {
+      "epoch": 3.8476337052712584,
+      "grad_norm": 1.4234368801116943,
+      "learning_rate": 4.801120448179272e-06,
+      "loss": 0.0638,
+      "step": 20000
+    },
+    {
+      "epoch": 3.8476337052712584,
+      "eval_cosine_accuracy@1": 0.5417917764568907,
+      "eval_cosine_accuracy@10": 0.8925792021569949,
+      "eval_cosine_accuracy@3": 0.7492799803909553,
+      "eval_cosine_accuracy@5": 0.8206997977817269,
+      "eval_cosine_map@100": 0.6648556804097842,
+      "eval_cosine_mrr@10": 0.6602137736030805,
+      "eval_cosine_ndcg@10": 0.7165749273596944,
+      "eval_cosine_precision@1": 0.5417917764568907,
+      "eval_cosine_precision@10": 0.08925792021569949,
+      "eval_cosine_precision@3": 0.24975999346365177,
+      "eval_cosine_precision@5": 0.16413995955634536,
+      "eval_cosine_recall@1": 0.5417917764568907,
+      "eval_cosine_recall@10": 0.8925792021569949,
+      "eval_cosine_recall@3": 0.7492799803909553,
+      "eval_cosine_recall@5": 0.8206997977817269,
+      "eval_runtime": 37.3088,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 20000
+    },
+    {
+      "epoch": 3.9438245479030396,
+      "grad_norm": 2.059201240539551,
+      "learning_rate": 4.400960384153662e-06,
+      "loss": 0.0613,
+      "step": 20500
+    },
+    {
+      "epoch": 4.040015390534821,
+      "grad_norm": 2.8143606185913086,
+      "learning_rate": 4.000800320128051e-06,
+      "loss": 0.061,
+      "step": 21000
+    },
+    {
+      "epoch": 4.040015390534821,
+      "eval_cosine_accuracy@1": 0.5425577547643851,
+      "eval_cosine_accuracy@10": 0.8922115325693977,
+      "eval_cosine_accuracy@3": 0.7497395673754519,
+      "eval_cosine_accuracy@5": 0.8215270543538207,
+      "eval_cosine_map@100": 0.6656333161549883,
+      "eval_cosine_mrr@10": 0.6609160054936519,
+      "eval_cosine_ndcg@10": 0.7170515802988034,
+      "eval_cosine_precision@1": 0.5425577547643851,
+      "eval_cosine_precision@10": 0.08922115325693976,
+      "eval_cosine_precision@3": 0.24991318912515062,
+      "eval_cosine_precision@5": 0.16430541087076414,
+      "eval_cosine_recall@1": 0.5425577547643851,
+      "eval_cosine_recall@10": 0.8922115325693977,
+      "eval_cosine_recall@3": 0.7497395673754519,
+      "eval_cosine_recall@5": 0.8215270543538207,
+      "eval_runtime": 38.0609,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 21000
+    },
+    {
+      "epoch": 4.136206233166603,
+      "grad_norm": 1.6916602849960327,
+      "learning_rate": 3.6006402561024412e-06,
+      "loss": 0.0583,
+      "step": 21500
+    },
+    {
+      "epoch": 4.232397075798384,
+      "grad_norm": 2.4838967323303223,
+      "learning_rate": 3.2012805122048822e-06,
+      "loss": 0.0602,
+      "step": 22000
+    },
+    {
+      "epoch": 4.232397075798384,
+      "eval_cosine_accuracy@1": 0.5435688461302776,
+      "eval_cosine_accuracy@10": 0.8929775108768919,
+      "eval_cosine_accuracy@3": 0.7499540413015503,
+      "eval_cosine_accuracy@5": 0.8217415282799191,
+      "eval_cosine_map@100": 0.6663101850931815,
+      "eval_cosine_mrr@10": 0.6616394902426588,
+      "eval_cosine_ndcg@10": 0.7177557535908854,
+      "eval_cosine_precision@1": 0.5435688461302776,
+      "eval_cosine_precision@10": 0.0892977510876892,
+      "eval_cosine_precision@3": 0.24998468043385008,
+      "eval_cosine_precision@5": 0.16434830565598385,
+      "eval_cosine_recall@1": 0.5435688461302776,
+      "eval_cosine_recall@10": 0.8929775108768919,
+      "eval_cosine_recall@3": 0.7499540413015503,
+      "eval_cosine_recall@5": 0.8217415282799191,
+      "eval_runtime": 37.3184,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 22000
+    },
+    {
+      "epoch": 4.328587918430165,
+      "grad_norm": 1.5632020235061646,
+      "learning_rate": 2.8011204481792718e-06,
+      "loss": 0.0599,
+      "step": 22500
+    },
+    {
+      "epoch": 4.424778761061947,
+      "grad_norm": 1.0466374158859253,
+      "learning_rate": 2.4009603841536618e-06,
+      "loss": 0.0579,
+      "step": 23000
+    },
+    {
+      "epoch": 4.424778761061947,
+      "eval_cosine_accuracy@1": 0.5439977939824744,
+      "eval_cosine_accuracy@10": 0.8934370978613886,
+      "eval_cosine_accuracy@3": 0.7516391935780379,
+      "eval_cosine_accuracy@5": 0.8220479196029169,
+      "eval_cosine_map@100": 0.6671602335542375,
+      "eval_cosine_mrr@10": 0.6625040609008773,
+      "eval_cosine_ndcg@10": 0.7185422064601371,
+      "eval_cosine_precision@1": 0.5439977939824744,
+      "eval_cosine_precision@10": 0.08934370978613886,
+      "eval_cosine_precision@3": 0.2505463978593459,
+      "eval_cosine_precision@5": 0.16440958392058336,
+      "eval_cosine_recall@1": 0.5439977939824744,
+      "eval_cosine_recall@10": 0.8934370978613886,
+      "eval_cosine_recall@3": 0.7516391935780379,
+      "eval_cosine_recall@5": 0.8220479196029169,
+      "eval_runtime": 37.6028,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 23000
+    },
+    {
+      "epoch": 4.520969603693729,
+      "grad_norm": 2.3420305252075195,
+      "learning_rate": 2.0008003201280513e-06,
+      "loss": 0.0586,
+      "step": 23500
+    },
+    {
+      "epoch": 4.61716044632551,
+      "grad_norm": 1.0525851249694824,
+      "learning_rate": 1.6006402561024411e-06,
+      "loss": 0.061,
+      "step": 24000
+    },
+    {
+      "epoch": 4.61716044632551,
+      "eval_cosine_accuracy@1": 0.5431398982780807,
+      "eval_cosine_accuracy@10": 0.8937741283166861,
+      "eval_cosine_accuracy@3": 0.750413628286047,
+      "eval_cosine_accuracy@5": 0.8224462283228139,
+      "eval_cosine_map@100": 0.6664220156098503,
+      "eval_cosine_mrr@10": 0.6617887952206918,
+      "eval_cosine_ndcg@10": 0.71806761340154,
+      "eval_cosine_precision@1": 0.5431398982780807,
+      "eval_cosine_precision@10": 0.0893774128316686,
+      "eval_cosine_precision@3": 0.250137876095349,
+      "eval_cosine_precision@5": 0.1644892456645628,
+      "eval_cosine_recall@1": 0.5431398982780807,
+      "eval_cosine_recall@10": 0.8937741283166861,
+      "eval_cosine_recall@3": 0.750413628286047,
+      "eval_cosine_recall@5": 0.8224462283228139,
+      "eval_runtime": 38.033,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 24000
+    },
+    {
+      "epoch": 4.713351288957291,
+      "grad_norm": 0.856259286403656,
+      "learning_rate": 1.201280512204882e-06,
+      "loss": 0.0591,
+      "step": 24500
+    },
+    {
+      "epoch": 4.8095421315890725,
+      "grad_norm": 2.3736488819122314,
+      "learning_rate": 8.011204481792719e-07,
+      "loss": 0.0568,
+      "step": 25000
+    },
+    {
+      "epoch": 4.8095421315890725,
+      "eval_cosine_accuracy@1": 0.5447024940253692,
+      "eval_cosine_accuracy@10": 0.893927323978185,
+      "eval_cosine_accuracy@3": 0.751761750107237,
+      "eval_cosine_accuracy@5": 0.8228138979104112,
+      "eval_cosine_map@100": 0.6674861654698347,
+      "eval_cosine_mrr@10": 0.6628477055180666,
+      "eval_cosine_ndcg@10": 0.7189132708892791,
+      "eval_cosine_precision@1": 0.5447024940253692,
+      "eval_cosine_precision@10": 0.08939273239781849,
+      "eval_cosine_precision@3": 0.2505872500357456,
+      "eval_cosine_precision@5": 0.16456277958208224,
+      "eval_cosine_recall@1": 0.5447024940253692,
+      "eval_cosine_recall@10": 0.893927323978185,
+      "eval_cosine_recall@3": 0.751761750107237,
+      "eval_cosine_recall@5": 0.8228138979104112,
+      "eval_runtime": 37.6063,
+      "eval_samples_per_second": 0.0,
+      "eval_steps_per_second": 0.0,
+      "step": 25000
+    },
+    {
+      "epoch": 4.905732974220854,
+      "grad_norm": 0.7087424397468567,
+      "learning_rate": 4.009603841536615e-07,
+      "loss": 0.057,
+      "step": 25500
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 25990,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": null
+}

biencoder-checkpoints/checkpoint-tinybiobert/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2583d3517ab53755d75547370ad94b89e3ea203066a5969526881be0e9c58c83
+size 5560

biencoder-checkpoints/checkpoint-tinybiobert/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff