ayuwal12 commited on 1 day ago

Commit

6690520

verified ·

1 Parent(s): 9a76225

Upload LoRA fine-tuned BioMistral-7B model

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +276 -0
adapter_config.json +41 -0
adapter_model.safetensors +3 -0
chat_template.jinja +1 -0
checkpoint-1000/README.md +207 -0
checkpoint-1000/adapter_config.json +41 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/chat_template.jinja +1 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scaler.pt +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/special_tokens_map.json +24 -0
checkpoint-1000/tokenizer.json +0 -0
checkpoint-1000/tokenizer.model +3 -0
checkpoint-1000/tokenizer_config.json +44 -0
checkpoint-1000/trainer_state.json +750 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1500/README.md +207 -0
checkpoint-1500/adapter_config.json +41 -0
checkpoint-1500/adapter_model.safetensors +3 -0
checkpoint-1500/chat_template.jinja +1 -0
checkpoint-1500/optimizer.pt +3 -0
checkpoint-1500/rng_state.pth +3 -0
checkpoint-1500/scaler.pt +3 -0
checkpoint-1500/scheduler.pt +3 -0
checkpoint-1500/special_tokens_map.json +24 -0
checkpoint-1500/tokenizer.json +0 -0
checkpoint-1500/tokenizer.model +3 -0
checkpoint-1500/tokenizer_config.json +44 -0
checkpoint-1500/trainer_state.json +1108 -0
checkpoint-1500/training_args.bin +3 -0
checkpoint-2000/README.md +207 -0
checkpoint-2000/adapter_config.json +41 -0
checkpoint-2000/adapter_model.safetensors +3 -0
checkpoint-2000/chat_template.jinja +1 -0
checkpoint-2000/optimizer.pt +3 -0
checkpoint-2000/rng_state.pth +3 -0
checkpoint-2000/scaler.pt +3 -0
checkpoint-2000/scheduler.pt +3 -0
checkpoint-2000/special_tokens_map.json +24 -0
checkpoint-2000/tokenizer.json +0 -0
checkpoint-2000/tokenizer.model +3 -0
checkpoint-2000/tokenizer_config.json +44 -0
checkpoint-2000/trainer_state.json +1466 -0
checkpoint-2000/training_args.bin +3 -0
checkpoint-2500/README.md +207 -0
checkpoint-2500/adapter_config.json +41 -0
checkpoint-2500/adapter_model.safetensors +3 -0
checkpoint-2500/chat_template.jinja +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,276 @@

+# BioMistral-7B LoRA Fine-tuned on MedQuAD
+This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B) for medical question answering, trained on the MedQuAD dataset from Kaggle.
+## Model Description
+- **Base Model**: BioMistral/BioMistral-7B
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **Model Type**: Causal Language Model
+- **Training Dataset**: MedQuAD (Medical Question Answering Dataset)
+- **Domain**: Medical/Biomedical
+- **Language**: English
+- **License**: Apache 2.0 (inherited from base model)
+## Dataset Information
+### MedQuAD Dataset
+- **Source**: [MedQuAD on Kaggle](https://www.kaggle.com/datasets/jpmiller/medquad)
+- **Full Name**: Medical Question Answering Dataset
+- **Description**: A collection of medical questions and answers from trusted medical sources
+- **Training Examples**: 14,770 question-answer pairs
+- **Validation Examples**: 1,642 question-answer pairs
+- **Format**: Instruction-Input-Output triplets for medical Q&A
+### Data Sources (MedQuAD)
+The MedQuAD dataset contains medical information from various authoritative sources including:
+- National Institutes of Health (NIH)
+- National Cancer Institute (NCI)
+- National Institute of Mental Health (NIMH)
+- Centers for Disease Control and Prevention (CDC)
+- And other trusted medical organizations
+## Training Details
+### Training Configuration
+- **Training Steps**: 2,772 (3 epochs)
+- **Batch Size**: 2 per device
+- **Gradient Accumulation**: 8 steps
+- **Effective Batch Size**: 16
+- **Learning Rate**: 2e-4
+- **Warmup Steps**: 100
+- **Max Sequence Length**: 512
+- **Optimizer**: AdamW
+- **Precision**: FP16
+### LoRA Configuration
+- **LoRA Rank (r)**: 16
+- **LoRA Alpha**: 32
+- **LoRA Dropout**: 0.1
+- **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
+- **Trainable Parameters**: ~0.1% of total parameters
+### Training Results
+| Step | Training Loss | Validation Loss |
+|------|---------------|-----------------|
+| 500  | 0.8277        | 0.8332          |
+| 1000 | 0.5424        | 0.8180          |
+| 1500 | 0.5696        | 0.7986          |
+| 2000 | 0.3430        | 0.8451          |
+| 2500 | 0.3184        | 0.8488          |
+**Final Validation Loss**: 0.8488
+## Installation
+```bash
+pip install transformers peft torch accelerate bitsandbytes
+```
+## Usage
+### Option 1: Using the Full Fine-tuned Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load the fine-tuned model
+model_name = "ayuwal12/biomistral-7b-finetuned"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+def generate_medical_response(question, context="", max_length=256):
+    # Format the prompt for medical Q&A
+    if context.strip():
+        prompt = f"### Instruction:\\n{question}\\n\\n### Input:\\n{context}\\n\\n### Response:\\n"
+    else:
+        prompt = f"### Instruction:\\n{question}\\n\\n### Response:\\n"
+    # Tokenize and generate
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=max_length,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id,
+            eos_token_id=tokenizer.eos_token_id
+        )
+    # Decode and extract response
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    return response.split("### Response:\\n")[-1].strip()
+# Example usage
+response = generate_medical_response("What is diabetes and what are its main types?")
+print(response)
+```
+### Option 2: Using LoRA Adapters (Recommended)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load base model
+base_model_name = "BioMistral/BioMistral-7B"
+tokenizer = AutoTokenizer.from_pretrained(base_model_name)
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_name,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+# Load LoRA adapters
+lora_model_name = "ayuwal12/biomistral-7b-lora-adapters"
+model = PeftModel.from_pretrained(base_model, lora_model_name)
+# Set pad token
+if tokenizer.pad_token is None:
+    tokenizer.pad_token = tokenizer.eos_token
+def generate_medical_response(question, context="", max_length=256):
+    if context.strip():
+        prompt = f"### Instruction:\\n{question}\\n\\n### Input:\\n{context}\\n\\n### Response:\\n"
+    else:
+        prompt = f"### Instruction:\\n{question}\\n\\n### Response:\\n"
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=max_length,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    return response.split("### Response:\\n")[-1].strip()
+# Example usage
+response = generate_medical_response("What are the symptoms of hypertension?")
+print(response)
+```
+## Example Medical Questions
+### General Medical Questions
+```python
+question = "What is hypertension and how is it diagnosed?"
+response = generate_medical_response(question)
+```
+### Symptoms and Conditions
+```python
+question = "What are the common symptoms of type 2 diabetes?"
+response = generate_medical_response(question)
+```
+### Treatment and Management
+```python
+question = "How is high blood pressure treated?"
+response = generate_medical_response(question)
+```
+### With Medical Context
+```python
+question = "What should I know about this condition?"
+context = "Patient has been diagnosed with stage 1 hypertension"
+response = generate_medical_response(question, context)
+```
+## Model Performance
+- **Training Loss**: Decreased from 0.83 to 0.32 over 3 epochs
+- **Validation Loss**: Stabilized around 0.85
+- **Convergence**: Model shows good learning with minimal overfitting
+- **Memory Efficiency**: Uses ~0.1% trainable parameters via LoRA
+- **Domain**: Specialized for medical question answering
+## Capabilities
+This model excels at:
+- ✅ **Medical Question Answering**: Trained specifically on medical Q&A pairs
+- ✅ **Disease Information**: Provides information about various medical conditions
+- ✅ **Symptom Analysis**: Explains symptoms and their significance
+- ✅ **Treatment Overview**: Discusses general treatment approaches
+- ✅ **Medical Terminology**: Understands and explains medical terms
+## Limitations
+- Based on BioMistral-7B, inherits its limitations
+- Trained on MedQuAD dataset, may not cover all medical domains equally
+- **Not for diagnosis**: Cannot replace professional medical evaluation
+- **Information only**: Provides general medical information, not personalized advice
+- May not have the most recent medical research (depends on training data cutoff)
+## Intended Use
+This model is designed for:
+- 📚 **Educational purposes** in medical and healthcare domains
+- 🔬 **Research applications** in biomedical NLP
+- 💡 **Medical information retrieval** systems
+- 🏥 **Healthcare chatbots** (with appropriate disclaimers)
+- 📖 **Medical knowledge base** applications
+## Ethical Considerations & Medical Disclaimer
+⚠️ **IMPORTANT MEDICAL DISCLAIMER**:
+- This model is for **educational and research purposes only**
+- **NOT for medical diagnosis** or treatment decisions
+- Always consult qualified healthcare professionals for medical advice
+- AI-generated medical content may contain errors or biases
+- Do not use this model for emergency medical situations
+- Individual medical conditions require personalized professional care
+## Dataset Citation
+```bibtex
+@misc{medquad,
+  title={MedQuAD: Medical Question Answering Dataset},
+  author={Ben Abacha, Asma and Mrabet, Yassine and Zhang, Yuhao and Shivade, Chaitanya and Langlotz, Curtis and Demner-Fushman, Dina},
+  year={2019},
+  howpublished={Available on Kaggle: https://www.kaggle.com/datasets/jpmiller/medquad}
+}
+```
+## Model Citation
+If you use this model, please cite:
+```bibtex
+@misc{biomistral-medquad-lora,
+  title={BioMistral-7B LoRA Fine-tuned on MedQuAD},
+  author={Ayuwal},
+  year={2024},
+  howpublished={https://huggingface.co/ayuwal12/biomistral-7b-finetuned},
+}
+```
+## Acknowledgments
+- **Base model**: [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)
+- **Training dataset**: [MedQuAD](https://www.kaggle.com/datasets/jpmiller/medquad)
+- **LoRA implementation**: [PEFT](https://github.com/huggingface/peft)
+- **Training framework**: [Transformers](https://github.com/huggingface/transformers)
+- **Original MedQuAD authors**: Ben Abacha et al.
+## Contact
+For questions or issues, please open an issue on the model repository.
+---
+*This model was trained on the MedQuAD dataset and is intended for educational and research purposes in the medical domain. Always consult healthcare professionals for medical advice.*

adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "BioMistral/BioMistral-7B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "k_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6acb6477671a04aa0dae759554a5d2784b51a1f041302953a830bf41dac335c0
+size 167832240

chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: BioMistral/BioMistral-7B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:BioMistral/BioMistral-7B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "BioMistral/BioMistral-7B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "k_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d096215cd5a79308ef2f002f3ed29b12ebb25b4f3aa2b8600bafe263c05e2ec8
+size 167832240

checkpoint-1000/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c44d9952b18d7d82a941cb04693044a0be58b4d67b9bf87344262bda89b0d60
+size 335922386

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9c248bdefa931c4b8818ef14890f078eb74e00ffacb25c16b33beb19deb757d
+size 14244

checkpoint-1000/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a25a54ef013052084cc1af4b9237b8bf9a919c4653e785c1b249c0020f99c494
+size 988

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9829c541fbe820d9473a51158fb1381e97abcf18a78e66b970ecc18cee00706a
+size 1064

checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1000/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1000/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,750 @@

+{
+  "best_global_step": 1000,
+  "best_metric": 0.8179630041122437,
+  "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1000",
+  "epoch": 1.0823290453622207,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.010832769126607989,
+      "grad_norm": 0.7727395296096802,
+      "learning_rate": 1.8e-05,
+      "loss": 0.889,
+      "step": 10
+    },
+    {
+      "epoch": 0.021665538253215978,
+      "grad_norm": 0.8008129596710205,
+      "learning_rate": 3.8e-05,
+      "loss": 0.8378,
+      "step": 20
+    },
+    {
+      "epoch": 0.03249830737982397,
+      "grad_norm": 0.9147247076034546,
+      "learning_rate": 5.8e-05,
+      "loss": 0.8108,
+      "step": 30
+    },
+    {
+      "epoch": 0.043331076506431955,
+      "grad_norm": 0.8121607303619385,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.8597,
+      "step": 40
+    },
+    {
+      "epoch": 0.05416384563303995,
+      "grad_norm": 1.0018593072891235,
+      "learning_rate": 9.8e-05,
+      "loss": 0.7486,
+      "step": 50
+    },
+    {
+      "epoch": 0.06499661475964794,
+      "grad_norm": 1.2048218250274658,
+      "learning_rate": 0.000118,
+      "loss": 0.6825,
+      "step": 60
+    },
+    {
+      "epoch": 0.07582938388625593,
+      "grad_norm": 0.9863468408584595,
+      "learning_rate": 0.000138,
+      "loss": 0.6539,
+      "step": 70
+    },
+    {
+      "epoch": 0.08666215301286391,
+      "grad_norm": 1.2911494970321655,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 0.6198,
+      "step": 80
+    },
+    {
+      "epoch": 0.0974949221394719,
+      "grad_norm": 1.159672737121582,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 0.6222,
+      "step": 90
+    },
+    {
+      "epoch": 0.1083276912660799,
+      "grad_norm": 1.0924432277679443,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 0.5923,
+      "step": 100
+    },
+    {
+      "epoch": 0.11916046039268788,
+      "grad_norm": 1.3423463106155396,
+      "learning_rate": 0.00019932634730538925,
+      "loss": 0.5548,
+      "step": 110
+    },
+    {
+      "epoch": 0.12999322951929587,
+      "grad_norm": 1.4929102659225464,
+      "learning_rate": 0.00019857784431137723,
+      "loss": 0.6701,
+      "step": 120
+    },
+    {
+      "epoch": 0.14082599864590387,
+      "grad_norm": 0.9462954998016357,
+      "learning_rate": 0.00019782934131736527,
+      "loss": 0.8675,
+      "step": 130
+    },
+    {
+      "epoch": 0.15165876777251186,
+      "grad_norm": 0.9912289977073669,
+      "learning_rate": 0.0001970808383233533,
+      "loss": 0.9074,
+      "step": 140
+    },
+    {
+      "epoch": 0.16249153689911983,
+      "grad_norm": 1.1070538759231567,
+      "learning_rate": 0.00019633233532934132,
+      "loss": 0.8755,
+      "step": 150
+    },
+    {
+      "epoch": 0.17332430602572782,
+      "grad_norm": 0.9465340375900269,
+      "learning_rate": 0.00019558383233532936,
+      "loss": 0.882,
+      "step": 160
+    },
+    {
+      "epoch": 0.18415707515233581,
+      "grad_norm": 0.8657329678535461,
+      "learning_rate": 0.00019483532934131737,
+      "loss": 0.8737,
+      "step": 170
+    },
+    {
+      "epoch": 0.1949898442789438,
+      "grad_norm": 0.7293577790260315,
+      "learning_rate": 0.0001940868263473054,
+      "loss": 0.8473,
+      "step": 180
+    },
+    {
+      "epoch": 0.2058226134055518,
+      "grad_norm": 0.849353551864624,
+      "learning_rate": 0.00019333832335329343,
+      "loss": 0.9414,
+      "step": 190
+    },
+    {
+      "epoch": 0.2166553825321598,
+      "grad_norm": 0.7525314688682556,
+      "learning_rate": 0.00019258982035928144,
+      "loss": 0.8852,
+      "step": 200
+    },
+    {
+      "epoch": 0.22748815165876776,
+      "grad_norm": 1.0732208490371704,
+      "learning_rate": 0.00019184131736526948,
+      "loss": 0.8074,
+      "step": 210
+    },
+    {
+      "epoch": 0.23832092078537576,
+      "grad_norm": 0.8420374393463135,
+      "learning_rate": 0.0001910928143712575,
+      "loss": 0.9508,
+      "step": 220
+    },
+    {
+      "epoch": 0.24915368991198375,
+      "grad_norm": 0.8308244347572327,
+      "learning_rate": 0.0001903443113772455,
+      "loss": 0.8734,
+      "step": 230
+    },
+    {
+      "epoch": 0.25998645903859174,
+      "grad_norm": 0.9915153384208679,
+      "learning_rate": 0.00018959580838323354,
+      "loss": 0.8816,
+      "step": 240
+    },
+    {
+      "epoch": 0.2708192281651997,
+      "grad_norm": 4.8621978759765625,
+      "learning_rate": 0.00018884730538922158,
+      "loss": 0.8848,
+      "step": 250
+    },
+    {
+      "epoch": 0.28165199729180773,
+      "grad_norm": 0.7945590019226074,
+      "learning_rate": 0.0001880988023952096,
+      "loss": 0.8503,
+      "step": 260
+    },
+    {
+      "epoch": 0.2924847664184157,
+      "grad_norm": 0.7896672487258911,
+      "learning_rate": 0.00018735029940119763,
+      "loss": 0.8798,
+      "step": 270
+    },
+    {
+      "epoch": 0.3033175355450237,
+      "grad_norm": 0.8870701789855957,
+      "learning_rate": 0.00018660179640718564,
+      "loss": 0.9112,
+      "step": 280
+    },
+    {
+      "epoch": 0.3141503046716317,
+      "grad_norm": 0.9003740549087524,
+      "learning_rate": 0.00018585329341317365,
+      "loss": 0.846,
+      "step": 290
+    },
+    {
+      "epoch": 0.32498307379823965,
+      "grad_norm": 0.7067676186561584,
+      "learning_rate": 0.0001851047904191617,
+      "loss": 0.8588,
+      "step": 300
+    },
+    {
+      "epoch": 0.3358158429248477,
+      "grad_norm": 0.9696246385574341,
+      "learning_rate": 0.0001843562874251497,
+      "loss": 0.8244,
+      "step": 310
+    },
+    {
+      "epoch": 0.34664861205145564,
+      "grad_norm": 0.9892609715461731,
+      "learning_rate": 0.00018360778443113774,
+      "loss": 0.8214,
+      "step": 320
+    },
+    {
+      "epoch": 0.35748138117806366,
+      "grad_norm": 0.822260856628418,
+      "learning_rate": 0.00018285928143712575,
+      "loss": 0.7977,
+      "step": 330
+    },
+    {
+      "epoch": 0.36831415030467163,
+      "grad_norm": 0.7743964791297913,
+      "learning_rate": 0.00018211077844311376,
+      "loss": 0.8002,
+      "step": 340
+    },
+    {
+      "epoch": 0.3791469194312796,
+      "grad_norm": 0.7090775370597839,
+      "learning_rate": 0.0001813622754491018,
+      "loss": 0.8192,
+      "step": 350
+    },
+    {
+      "epoch": 0.3899796885578876,
+      "grad_norm": 1.0970802307128906,
+      "learning_rate": 0.00018061377245508984,
+      "loss": 0.8516,
+      "step": 360
+    },
+    {
+      "epoch": 0.4008124576844956,
+      "grad_norm": 0.9633163213729858,
+      "learning_rate": 0.00017986526946107785,
+      "loss": 0.8414,
+      "step": 370
+    },
+    {
+      "epoch": 0.4116452268111036,
+      "grad_norm": 0.6846926808357239,
+      "learning_rate": 0.00017911676646706587,
+      "loss": 0.8187,
+      "step": 380
+    },
+    {
+      "epoch": 0.42247799593771157,
+      "grad_norm": 0.7262110710144043,
+      "learning_rate": 0.0001783682634730539,
+      "loss": 0.8572,
+      "step": 390
+    },
+    {
+      "epoch": 0.4333107650643196,
+      "grad_norm": 0.8537372350692749,
+      "learning_rate": 0.00017761976047904192,
+      "loss": 0.8286,
+      "step": 400
+    },
+    {
+      "epoch": 0.44414353419092756,
+      "grad_norm": 0.8860271573066711,
+      "learning_rate": 0.00017687125748502996,
+      "loss": 0.8416,
+      "step": 410
+    },
+    {
+      "epoch": 0.4549763033175355,
+      "grad_norm": 0.7984218597412109,
+      "learning_rate": 0.000176122754491018,
+      "loss": 0.8373,
+      "step": 420
+    },
+    {
+      "epoch": 0.46580907244414355,
+      "grad_norm": 0.8060943484306335,
+      "learning_rate": 0.000175374251497006,
+      "loss": 0.9165,
+      "step": 430
+    },
+    {
+      "epoch": 0.4766418415707515,
+      "grad_norm": 0.7871391177177429,
+      "learning_rate": 0.00017462574850299402,
+      "loss": 0.8276,
+      "step": 440
+    },
+    {
+      "epoch": 0.48747461069735953,
+      "grad_norm": 0.7732688784599304,
+      "learning_rate": 0.00017387724550898203,
+      "loss": 0.8346,
+      "step": 450
+    },
+    {
+      "epoch": 0.4983073798239675,
+      "grad_norm": 0.9314000606536865,
+      "learning_rate": 0.00017312874251497007,
+      "loss": 0.8291,
+      "step": 460
+    },
+    {
+      "epoch": 0.5091401489505755,
+      "grad_norm": 0.6721988916397095,
+      "learning_rate": 0.0001723802395209581,
+      "loss": 0.7091,
+      "step": 470
+    },
+    {
+      "epoch": 0.5199729180771835,
+      "grad_norm": 0.825965940952301,
+      "learning_rate": 0.00017163173652694612,
+      "loss": 0.8934,
+      "step": 480
+    },
+    {
+      "epoch": 0.5308056872037915,
+      "grad_norm": 0.8427668213844299,
+      "learning_rate": 0.00017088323353293413,
+      "loss": 0.7603,
+      "step": 490
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "grad_norm": 1.0061259269714355,
+      "learning_rate": 0.00017013473053892217,
+      "loss": 0.8277,
+      "step": 500
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "eval_loss": 0.8331602811813354,
+      "eval_runtime": 355.9061,
+      "eval_samples_per_second": 4.614,
+      "eval_steps_per_second": 2.307,
+      "step": 500
+    },
+    {
+      "epoch": 0.5524712254570074,
+      "grad_norm": 0.8820628523826599,
+      "learning_rate": 0.00016938622754491018,
+      "loss": 0.8348,
+      "step": 510
+    },
+    {
+      "epoch": 0.5633039945836155,
+      "grad_norm": 0.8095284700393677,
+      "learning_rate": 0.00016863772455089822,
+      "loss": 0.9172,
+      "step": 520
+    },
+    {
+      "epoch": 0.5741367637102234,
+      "grad_norm": 0.6959540843963623,
+      "learning_rate": 0.00016788922155688623,
+      "loss": 0.838,
+      "step": 530
+    },
+    {
+      "epoch": 0.5849695328368314,
+      "grad_norm": 0.835831880569458,
+      "learning_rate": 0.00016714071856287424,
+      "loss": 0.8887,
+      "step": 540
+    },
+    {
+      "epoch": 0.5958023019634394,
+      "grad_norm": 0.9289611577987671,
+      "learning_rate": 0.00016639221556886228,
+      "loss": 0.8514,
+      "step": 550
+    },
+    {
+      "epoch": 0.6066350710900474,
+      "grad_norm": 0.6904628872871399,
+      "learning_rate": 0.00016564371257485032,
+      "loss": 0.8645,
+      "step": 560
+    },
+    {
+      "epoch": 0.6174678402166554,
+      "grad_norm": 0.8879178762435913,
+      "learning_rate": 0.00016489520958083833,
+      "loss": 0.8201,
+      "step": 570
+    },
+    {
+      "epoch": 0.6283006093432634,
+      "grad_norm": 0.8411425948143005,
+      "learning_rate": 0.00016414670658682637,
+      "loss": 0.836,
+      "step": 580
+    },
+    {
+      "epoch": 0.6391333784698714,
+      "grad_norm": 0.8564555644989014,
+      "learning_rate": 0.00016339820359281436,
+      "loss": 0.7724,
+      "step": 590
+    },
+    {
+      "epoch": 0.6499661475964793,
+      "grad_norm": 0.8382830619812012,
+      "learning_rate": 0.0001626497005988024,
+      "loss": 0.7839,
+      "step": 600
+    },
+    {
+      "epoch": 0.6607989167230873,
+      "grad_norm": 0.7657437920570374,
+      "learning_rate": 0.00016190119760479043,
+      "loss": 0.7973,
+      "step": 610
+    },
+    {
+      "epoch": 0.6716316858496953,
+      "grad_norm": 0.7758445143699646,
+      "learning_rate": 0.00016115269461077845,
+      "loss": 0.8111,
+      "step": 620
+    },
+    {
+      "epoch": 0.6824644549763034,
+      "grad_norm": 1.0041533708572388,
+      "learning_rate": 0.00016040419161676649,
+      "loss": 0.8359,
+      "step": 630
+    },
+    {
+      "epoch": 0.6932972241029113,
+      "grad_norm": 0.9679577946662903,
+      "learning_rate": 0.0001596556886227545,
+      "loss": 0.8822,
+      "step": 640
+    },
+    {
+      "epoch": 0.7041299932295193,
+      "grad_norm": 0.8141391277313232,
+      "learning_rate": 0.0001589071856287425,
+      "loss": 0.8714,
+      "step": 650
+    },
+    {
+      "epoch": 0.7149627623561273,
+      "grad_norm": 0.7982810139656067,
+      "learning_rate": 0.00015815868263473055,
+      "loss": 0.856,
+      "step": 660
+    },
+    {
+      "epoch": 0.7257955314827352,
+      "grad_norm": 0.7932000160217285,
+      "learning_rate": 0.00015741017964071859,
+      "loss": 0.8405,
+      "step": 670
+    },
+    {
+      "epoch": 0.7366283006093433,
+      "grad_norm": 0.7269508242607117,
+      "learning_rate": 0.0001566616766467066,
+      "loss": 0.8371,
+      "step": 680
+    },
+    {
+      "epoch": 0.7474610697359513,
+      "grad_norm": 0.9001722931861877,
+      "learning_rate": 0.0001559131736526946,
+      "loss": 0.8305,
+      "step": 690
+    },
+    {
+      "epoch": 0.7582938388625592,
+      "grad_norm": 0.6795508861541748,
+      "learning_rate": 0.00015516467065868262,
+      "loss": 0.8324,
+      "step": 700
+    },
+    {
+      "epoch": 0.7691266079891672,
+      "grad_norm": 0.8868729472160339,
+      "learning_rate": 0.00015441616766467066,
+      "loss": 0.8521,
+      "step": 710
+    },
+    {
+      "epoch": 0.7799593771157752,
+      "grad_norm": 0.9720478653907776,
+      "learning_rate": 0.0001536676646706587,
+      "loss": 0.7759,
+      "step": 720
+    },
+    {
+      "epoch": 0.7907921462423833,
+      "grad_norm": 0.8006075620651245,
+      "learning_rate": 0.0001529191616766467,
+      "loss": 0.7981,
+      "step": 730
+    },
+    {
+      "epoch": 0.8016249153689912,
+      "grad_norm": 0.9107721447944641,
+      "learning_rate": 0.00015217065868263475,
+      "loss": 0.7868,
+      "step": 740
+    },
+    {
+      "epoch": 0.8124576844955992,
+      "grad_norm": 0.7584466338157654,
+      "learning_rate": 0.00015142215568862276,
+      "loss": 0.7401,
+      "step": 750
+    },
+    {
+      "epoch": 0.8232904536222072,
+      "grad_norm": 1.0075221061706543,
+      "learning_rate": 0.00015067365269461077,
+      "loss": 0.8024,
+      "step": 760
+    },
+    {
+      "epoch": 0.8341232227488151,
+      "grad_norm": 0.8769344091415405,
+      "learning_rate": 0.0001499251497005988,
+      "loss": 0.7779,
+      "step": 770
+    },
+    {
+      "epoch": 0.8449559918754231,
+      "grad_norm": 0.84312903881073,
+      "learning_rate": 0.00014917664670658685,
+      "loss": 0.8314,
+      "step": 780
+    },
+    {
+      "epoch": 0.8557887610020312,
+      "grad_norm": 0.8116353750228882,
+      "learning_rate": 0.00014842814371257486,
+      "loss": 0.8146,
+      "step": 790
+    },
+    {
+      "epoch": 0.8666215301286392,
+      "grad_norm": 0.8301011919975281,
+      "learning_rate": 0.00014767964071856287,
+      "loss": 0.7422,
+      "step": 800
+    },
+    {
+      "epoch": 0.8774542992552471,
+      "grad_norm": 0.8579692244529724,
+      "learning_rate": 0.00014693113772455091,
+      "loss": 0.7442,
+      "step": 810
+    },
+    {
+      "epoch": 0.8882870683818551,
+      "grad_norm": 0.7513943910598755,
+      "learning_rate": 0.00014618263473053893,
+      "loss": 0.7671,
+      "step": 820
+    },
+    {
+      "epoch": 0.8991198375084631,
+      "grad_norm": 0.9639107584953308,
+      "learning_rate": 0.00014543413173652696,
+      "loss": 0.7896,
+      "step": 830
+    },
+    {
+      "epoch": 0.909952606635071,
+      "grad_norm": 0.8897636532783508,
+      "learning_rate": 0.00014468562874251498,
+      "loss": 0.7613,
+      "step": 840
+    },
+    {
+      "epoch": 0.9207853757616791,
+      "grad_norm": 0.7998213171958923,
+      "learning_rate": 0.000143937125748503,
+      "loss": 0.7647,
+      "step": 850
+    },
+    {
+      "epoch": 0.9316181448882871,
+      "grad_norm": 0.6916050910949707,
+      "learning_rate": 0.00014318862275449103,
+      "loss": 0.7697,
+      "step": 860
+    },
+    {
+      "epoch": 0.942450914014895,
+      "grad_norm": 1.0154324769973755,
+      "learning_rate": 0.00014244011976047904,
+      "loss": 0.7314,
+      "step": 870
+    },
+    {
+      "epoch": 0.953283683141503,
+      "grad_norm": 0.9787517786026001,
+      "learning_rate": 0.00014169161676646708,
+      "loss": 0.8047,
+      "step": 880
+    },
+    {
+      "epoch": 0.964116452268111,
+      "grad_norm": 0.6035457253456116,
+      "learning_rate": 0.00014094311377245512,
+      "loss": 0.783,
+      "step": 890
+    },
+    {
+      "epoch": 0.9749492213947191,
+      "grad_norm": 0.940951943397522,
+      "learning_rate": 0.0001401946107784431,
+      "loss": 0.7741,
+      "step": 900
+    },
+    {
+      "epoch": 0.985781990521327,
+      "grad_norm": 0.7785654067993164,
+      "learning_rate": 0.00013944610778443114,
+      "loss": 0.7855,
+      "step": 910
+    },
+    {
+      "epoch": 0.996614759647935,
+      "grad_norm": 0.8356137275695801,
+      "learning_rate": 0.00013869760479041918,
+      "loss": 0.8292,
+      "step": 920
+    },
+    {
+      "epoch": 1.0064996614759647,
+      "grad_norm": 0.6590499877929688,
+      "learning_rate": 0.0001379491017964072,
+      "loss": 0.6858,
+      "step": 930
+    },
+    {
+      "epoch": 1.0173324306025728,
+      "grad_norm": 1.0389671325683594,
+      "learning_rate": 0.00013720059880239523,
+      "loss": 0.6097,
+      "step": 940
+    },
+    {
+      "epoch": 1.0281651997291807,
+      "grad_norm": 0.9596243500709534,
+      "learning_rate": 0.00013645209580838324,
+      "loss": 0.5676,
+      "step": 950
+    },
+    {
+      "epoch": 1.0389979688557887,
+      "grad_norm": 1.0831798315048218,
+      "learning_rate": 0.00013570359281437125,
+      "loss": 0.6106,
+      "step": 960
+    },
+    {
+      "epoch": 1.0498307379823968,
+      "grad_norm": 0.92978835105896,
+      "learning_rate": 0.0001349550898203593,
+      "loss": 0.5924,
+      "step": 970
+    },
+    {
+      "epoch": 1.0606635071090047,
+      "grad_norm": 0.9672062993049622,
+      "learning_rate": 0.0001342065868263473,
+      "loss": 0.5496,
+      "step": 980
+    },
+    {
+      "epoch": 1.0714962762356128,
+      "grad_norm": 1.1402652263641357,
+      "learning_rate": 0.00013345808383233534,
+      "loss": 0.5871,
+      "step": 990
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "grad_norm": 1.1109035015106201,
+      "learning_rate": 0.00013270958083832335,
+      "loss": 0.5424,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "eval_loss": 0.8179630041122437,
+      "eval_runtime": 357.2769,
+      "eval_samples_per_second": 4.596,
+      "eval_steps_per_second": 2.298,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2772,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.523118244330209e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
+size 5304

checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: BioMistral/BioMistral-7B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:BioMistral/BioMistral-7B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "BioMistral/BioMistral-7B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "k_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6acb6477671a04aa0dae759554a5d2784b51a1f041302953a830bf41dac335c0
+size 167832240

checkpoint-1500/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e24d1aeae71b0c686a0c00153e93e6c8332148b91f1a06464f5e7331284b5850
+size 335922386

checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6492fa1abb8928e85806aef548738bd43054b1594362687738367dfdf1836137
+size 14244

checkpoint-1500/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54bb4f2ea251861747e8fc194eb844d57f95dac1c25d302b4ad59b349b681af6
+size 988

checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0455cd3e16cc5d63c9bdb4bcd02d9fd21bd515cbcda2087df9901523b6b81055
+size 1064

checkpoint-1500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1500/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

checkpoint-1500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1108 @@

+{
+  "best_global_step": 1500,
+  "best_metric": 0.7986094355583191,
+  "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1500",
+  "epoch": 1.6239675016926203,
+  "eval_steps": 500,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.010832769126607989,
+      "grad_norm": 0.7727395296096802,
+      "learning_rate": 1.8e-05,
+      "loss": 0.889,
+      "step": 10
+    },
+    {
+      "epoch": 0.021665538253215978,
+      "grad_norm": 0.8008129596710205,
+      "learning_rate": 3.8e-05,
+      "loss": 0.8378,
+      "step": 20
+    },
+    {
+      "epoch": 0.03249830737982397,
+      "grad_norm": 0.9147247076034546,
+      "learning_rate": 5.8e-05,
+      "loss": 0.8108,
+      "step": 30
+    },
+    {
+      "epoch": 0.043331076506431955,
+      "grad_norm": 0.8121607303619385,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.8597,
+      "step": 40
+    },
+    {
+      "epoch": 0.05416384563303995,
+      "grad_norm": 1.0018593072891235,
+      "learning_rate": 9.8e-05,
+      "loss": 0.7486,
+      "step": 50
+    },
+    {
+      "epoch": 0.06499661475964794,
+      "grad_norm": 1.2048218250274658,
+      "learning_rate": 0.000118,
+      "loss": 0.6825,
+      "step": 60
+    },
+    {
+      "epoch": 0.07582938388625593,
+      "grad_norm": 0.9863468408584595,
+      "learning_rate": 0.000138,
+      "loss": 0.6539,
+      "step": 70
+    },
+    {
+      "epoch": 0.08666215301286391,
+      "grad_norm": 1.2911494970321655,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 0.6198,
+      "step": 80
+    },
+    {
+      "epoch": 0.0974949221394719,
+      "grad_norm": 1.159672737121582,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 0.6222,
+      "step": 90
+    },
+    {
+      "epoch": 0.1083276912660799,
+      "grad_norm": 1.0924432277679443,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 0.5923,
+      "step": 100
+    },
+    {
+      "epoch": 0.11916046039268788,
+      "grad_norm": 1.3423463106155396,
+      "learning_rate": 0.00019932634730538925,
+      "loss": 0.5548,
+      "step": 110
+    },
+    {
+      "epoch": 0.12999322951929587,
+      "grad_norm": 1.4929102659225464,
+      "learning_rate": 0.00019857784431137723,
+      "loss": 0.6701,
+      "step": 120
+    },
+    {
+      "epoch": 0.14082599864590387,
+      "grad_norm": 0.9462954998016357,
+      "learning_rate": 0.00019782934131736527,
+      "loss": 0.8675,
+      "step": 130
+    },
+    {
+      "epoch": 0.15165876777251186,
+      "grad_norm": 0.9912289977073669,
+      "learning_rate": 0.0001970808383233533,
+      "loss": 0.9074,
+      "step": 140
+    },
+    {
+      "epoch": 0.16249153689911983,
+      "grad_norm": 1.1070538759231567,
+      "learning_rate": 0.00019633233532934132,
+      "loss": 0.8755,
+      "step": 150
+    },
+    {
+      "epoch": 0.17332430602572782,
+      "grad_norm": 0.9465340375900269,
+      "learning_rate": 0.00019558383233532936,
+      "loss": 0.882,
+      "step": 160
+    },
+    {
+      "epoch": 0.18415707515233581,
+      "grad_norm": 0.8657329678535461,
+      "learning_rate": 0.00019483532934131737,
+      "loss": 0.8737,
+      "step": 170
+    },
+    {
+      "epoch": 0.1949898442789438,
+      "grad_norm": 0.7293577790260315,
+      "learning_rate": 0.0001940868263473054,
+      "loss": 0.8473,
+      "step": 180
+    },
+    {
+      "epoch": 0.2058226134055518,
+      "grad_norm": 0.849353551864624,
+      "learning_rate": 0.00019333832335329343,
+      "loss": 0.9414,
+      "step": 190
+    },
+    {
+      "epoch": 0.2166553825321598,
+      "grad_norm": 0.7525314688682556,
+      "learning_rate": 0.00019258982035928144,
+      "loss": 0.8852,
+      "step": 200
+    },
+    {
+      "epoch": 0.22748815165876776,
+      "grad_norm": 1.0732208490371704,
+      "learning_rate": 0.00019184131736526948,
+      "loss": 0.8074,
+      "step": 210
+    },
+    {
+      "epoch": 0.23832092078537576,
+      "grad_norm": 0.8420374393463135,
+      "learning_rate": 0.0001910928143712575,
+      "loss": 0.9508,
+      "step": 220
+    },
+    {
+      "epoch": 0.24915368991198375,
+      "grad_norm": 0.8308244347572327,
+      "learning_rate": 0.0001903443113772455,
+      "loss": 0.8734,
+      "step": 230
+    },
+    {
+      "epoch": 0.25998645903859174,
+      "grad_norm": 0.9915153384208679,
+      "learning_rate": 0.00018959580838323354,
+      "loss": 0.8816,
+      "step": 240
+    },
+    {
+      "epoch": 0.2708192281651997,
+      "grad_norm": 4.8621978759765625,
+      "learning_rate": 0.00018884730538922158,
+      "loss": 0.8848,
+      "step": 250
+    },
+    {
+      "epoch": 0.28165199729180773,
+      "grad_norm": 0.7945590019226074,
+      "learning_rate": 0.0001880988023952096,
+      "loss": 0.8503,
+      "step": 260
+    },
+    {
+      "epoch": 0.2924847664184157,
+      "grad_norm": 0.7896672487258911,
+      "learning_rate": 0.00018735029940119763,
+      "loss": 0.8798,
+      "step": 270
+    },
+    {
+      "epoch": 0.3033175355450237,
+      "grad_norm": 0.8870701789855957,
+      "learning_rate": 0.00018660179640718564,
+      "loss": 0.9112,
+      "step": 280
+    },
+    {
+      "epoch": 0.3141503046716317,
+      "grad_norm": 0.9003740549087524,
+      "learning_rate": 0.00018585329341317365,
+      "loss": 0.846,
+      "step": 290
+    },
+    {
+      "epoch": 0.32498307379823965,
+      "grad_norm": 0.7067676186561584,
+      "learning_rate": 0.0001851047904191617,
+      "loss": 0.8588,
+      "step": 300
+    },
+    {
+      "epoch": 0.3358158429248477,
+      "grad_norm": 0.9696246385574341,
+      "learning_rate": 0.0001843562874251497,
+      "loss": 0.8244,
+      "step": 310
+    },
+    {
+      "epoch": 0.34664861205145564,
+      "grad_norm": 0.9892609715461731,
+      "learning_rate": 0.00018360778443113774,
+      "loss": 0.8214,
+      "step": 320
+    },
+    {
+      "epoch": 0.35748138117806366,
+      "grad_norm": 0.822260856628418,
+      "learning_rate": 0.00018285928143712575,
+      "loss": 0.7977,
+      "step": 330
+    },
+    {
+      "epoch": 0.36831415030467163,
+      "grad_norm": 0.7743964791297913,
+      "learning_rate": 0.00018211077844311376,
+      "loss": 0.8002,
+      "step": 340
+    },
+    {
+      "epoch": 0.3791469194312796,
+      "grad_norm": 0.7090775370597839,
+      "learning_rate": 0.0001813622754491018,
+      "loss": 0.8192,
+      "step": 350
+    },
+    {
+      "epoch": 0.3899796885578876,
+      "grad_norm": 1.0970802307128906,
+      "learning_rate": 0.00018061377245508984,
+      "loss": 0.8516,
+      "step": 360
+    },
+    {
+      "epoch": 0.4008124576844956,
+      "grad_norm": 0.9633163213729858,
+      "learning_rate": 0.00017986526946107785,
+      "loss": 0.8414,
+      "step": 370
+    },
+    {
+      "epoch": 0.4116452268111036,
+      "grad_norm": 0.6846926808357239,
+      "learning_rate": 0.00017911676646706587,
+      "loss": 0.8187,
+      "step": 380
+    },
+    {
+      "epoch": 0.42247799593771157,
+      "grad_norm": 0.7262110710144043,
+      "learning_rate": 0.0001783682634730539,
+      "loss": 0.8572,
+      "step": 390
+    },
+    {
+      "epoch": 0.4333107650643196,
+      "grad_norm": 0.8537372350692749,
+      "learning_rate": 0.00017761976047904192,
+      "loss": 0.8286,
+      "step": 400
+    },
+    {
+      "epoch": 0.44414353419092756,
+      "grad_norm": 0.8860271573066711,
+      "learning_rate": 0.00017687125748502996,
+      "loss": 0.8416,
+      "step": 410
+    },
+    {
+      "epoch": 0.4549763033175355,
+      "grad_norm": 0.7984218597412109,
+      "learning_rate": 0.000176122754491018,
+      "loss": 0.8373,
+      "step": 420
+    },
+    {
+      "epoch": 0.46580907244414355,
+      "grad_norm": 0.8060943484306335,
+      "learning_rate": 0.000175374251497006,
+      "loss": 0.9165,
+      "step": 430
+    },
+    {
+      "epoch": 0.4766418415707515,
+      "grad_norm": 0.7871391177177429,
+      "learning_rate": 0.00017462574850299402,
+      "loss": 0.8276,
+      "step": 440
+    },
+    {
+      "epoch": 0.48747461069735953,
+      "grad_norm": 0.7732688784599304,
+      "learning_rate": 0.00017387724550898203,
+      "loss": 0.8346,
+      "step": 450
+    },
+    {
+      "epoch": 0.4983073798239675,
+      "grad_norm": 0.9314000606536865,
+      "learning_rate": 0.00017312874251497007,
+      "loss": 0.8291,
+      "step": 460
+    },
+    {
+      "epoch": 0.5091401489505755,
+      "grad_norm": 0.6721988916397095,
+      "learning_rate": 0.0001723802395209581,
+      "loss": 0.7091,
+      "step": 470
+    },
+    {
+      "epoch": 0.5199729180771835,
+      "grad_norm": 0.825965940952301,
+      "learning_rate": 0.00017163173652694612,
+      "loss": 0.8934,
+      "step": 480
+    },
+    {
+      "epoch": 0.5308056872037915,
+      "grad_norm": 0.8427668213844299,
+      "learning_rate": 0.00017088323353293413,
+      "loss": 0.7603,
+      "step": 490
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "grad_norm": 1.0061259269714355,
+      "learning_rate": 0.00017013473053892217,
+      "loss": 0.8277,
+      "step": 500
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "eval_loss": 0.8331602811813354,
+      "eval_runtime": 355.9061,
+      "eval_samples_per_second": 4.614,
+      "eval_steps_per_second": 2.307,
+      "step": 500
+    },
+    {
+      "epoch": 0.5524712254570074,
+      "grad_norm": 0.8820628523826599,
+      "learning_rate": 0.00016938622754491018,
+      "loss": 0.8348,
+      "step": 510
+    },
+    {
+      "epoch": 0.5633039945836155,
+      "grad_norm": 0.8095284700393677,
+      "learning_rate": 0.00016863772455089822,
+      "loss": 0.9172,
+      "step": 520
+    },
+    {
+      "epoch": 0.5741367637102234,
+      "grad_norm": 0.6959540843963623,
+      "learning_rate": 0.00016788922155688623,
+      "loss": 0.838,
+      "step": 530
+    },
+    {
+      "epoch": 0.5849695328368314,
+      "grad_norm": 0.835831880569458,
+      "learning_rate": 0.00016714071856287424,
+      "loss": 0.8887,
+      "step": 540
+    },
+    {
+      "epoch": 0.5958023019634394,
+      "grad_norm": 0.9289611577987671,
+      "learning_rate": 0.00016639221556886228,
+      "loss": 0.8514,
+      "step": 550
+    },
+    {
+      "epoch": 0.6066350710900474,
+      "grad_norm": 0.6904628872871399,
+      "learning_rate": 0.00016564371257485032,
+      "loss": 0.8645,
+      "step": 560
+    },
+    {
+      "epoch": 0.6174678402166554,
+      "grad_norm": 0.8879178762435913,
+      "learning_rate": 0.00016489520958083833,
+      "loss": 0.8201,
+      "step": 570
+    },
+    {
+      "epoch": 0.6283006093432634,
+      "grad_norm": 0.8411425948143005,
+      "learning_rate": 0.00016414670658682637,
+      "loss": 0.836,
+      "step": 580
+    },
+    {
+      "epoch": 0.6391333784698714,
+      "grad_norm": 0.8564555644989014,
+      "learning_rate": 0.00016339820359281436,
+      "loss": 0.7724,
+      "step": 590
+    },
+    {
+      "epoch": 0.6499661475964793,
+      "grad_norm": 0.8382830619812012,
+      "learning_rate": 0.0001626497005988024,
+      "loss": 0.7839,
+      "step": 600
+    },
+    {
+      "epoch": 0.6607989167230873,
+      "grad_norm": 0.7657437920570374,
+      "learning_rate": 0.00016190119760479043,
+      "loss": 0.7973,
+      "step": 610
+    },
+    {
+      "epoch": 0.6716316858496953,
+      "grad_norm": 0.7758445143699646,
+      "learning_rate": 0.00016115269461077845,
+      "loss": 0.8111,
+      "step": 620
+    },
+    {
+      "epoch": 0.6824644549763034,
+      "grad_norm": 1.0041533708572388,
+      "learning_rate": 0.00016040419161676649,
+      "loss": 0.8359,
+      "step": 630
+    },
+    {
+      "epoch": 0.6932972241029113,
+      "grad_norm": 0.9679577946662903,
+      "learning_rate": 0.0001596556886227545,
+      "loss": 0.8822,
+      "step": 640
+    },
+    {
+      "epoch": 0.7041299932295193,
+      "grad_norm": 0.8141391277313232,
+      "learning_rate": 0.0001589071856287425,
+      "loss": 0.8714,
+      "step": 650
+    },
+    {
+      "epoch": 0.7149627623561273,
+      "grad_norm": 0.7982810139656067,
+      "learning_rate": 0.00015815868263473055,
+      "loss": 0.856,
+      "step": 660
+    },
+    {
+      "epoch": 0.7257955314827352,
+      "grad_norm": 0.7932000160217285,
+      "learning_rate": 0.00015741017964071859,
+      "loss": 0.8405,
+      "step": 670
+    },
+    {
+      "epoch": 0.7366283006093433,
+      "grad_norm": 0.7269508242607117,
+      "learning_rate": 0.0001566616766467066,
+      "loss": 0.8371,
+      "step": 680
+    },
+    {
+      "epoch": 0.7474610697359513,
+      "grad_norm": 0.9001722931861877,
+      "learning_rate": 0.0001559131736526946,
+      "loss": 0.8305,
+      "step": 690
+    },
+    {
+      "epoch": 0.7582938388625592,
+      "grad_norm": 0.6795508861541748,
+      "learning_rate": 0.00015516467065868262,
+      "loss": 0.8324,
+      "step": 700
+    },
+    {
+      "epoch": 0.7691266079891672,
+      "grad_norm": 0.8868729472160339,
+      "learning_rate": 0.00015441616766467066,
+      "loss": 0.8521,
+      "step": 710
+    },
+    {
+      "epoch": 0.7799593771157752,
+      "grad_norm": 0.9720478653907776,
+      "learning_rate": 0.0001536676646706587,
+      "loss": 0.7759,
+      "step": 720
+    },
+    {
+      "epoch": 0.7907921462423833,
+      "grad_norm": 0.8006075620651245,
+      "learning_rate": 0.0001529191616766467,
+      "loss": 0.7981,
+      "step": 730
+    },
+    {
+      "epoch": 0.8016249153689912,
+      "grad_norm": 0.9107721447944641,
+      "learning_rate": 0.00015217065868263475,
+      "loss": 0.7868,
+      "step": 740
+    },
+    {
+      "epoch": 0.8124576844955992,
+      "grad_norm": 0.7584466338157654,
+      "learning_rate": 0.00015142215568862276,
+      "loss": 0.7401,
+      "step": 750
+    },
+    {
+      "epoch": 0.8232904536222072,
+      "grad_norm": 1.0075221061706543,
+      "learning_rate": 0.00015067365269461077,
+      "loss": 0.8024,
+      "step": 760
+    },
+    {
+      "epoch": 0.8341232227488151,
+      "grad_norm": 0.8769344091415405,
+      "learning_rate": 0.0001499251497005988,
+      "loss": 0.7779,
+      "step": 770
+    },
+    {
+      "epoch": 0.8449559918754231,
+      "grad_norm": 0.84312903881073,
+      "learning_rate": 0.00014917664670658685,
+      "loss": 0.8314,
+      "step": 780
+    },
+    {
+      "epoch": 0.8557887610020312,
+      "grad_norm": 0.8116353750228882,
+      "learning_rate": 0.00014842814371257486,
+      "loss": 0.8146,
+      "step": 790
+    },
+    {
+      "epoch": 0.8666215301286392,
+      "grad_norm": 0.8301011919975281,
+      "learning_rate": 0.00014767964071856287,
+      "loss": 0.7422,
+      "step": 800
+    },
+    {
+      "epoch": 0.8774542992552471,
+      "grad_norm": 0.8579692244529724,
+      "learning_rate": 0.00014693113772455091,
+      "loss": 0.7442,
+      "step": 810
+    },
+    {
+      "epoch": 0.8882870683818551,
+      "grad_norm": 0.7513943910598755,
+      "learning_rate": 0.00014618263473053893,
+      "loss": 0.7671,
+      "step": 820
+    },
+    {
+      "epoch": 0.8991198375084631,
+      "grad_norm": 0.9639107584953308,
+      "learning_rate": 0.00014543413173652696,
+      "loss": 0.7896,
+      "step": 830
+    },
+    {
+      "epoch": 0.909952606635071,
+      "grad_norm": 0.8897636532783508,
+      "learning_rate": 0.00014468562874251498,
+      "loss": 0.7613,
+      "step": 840
+    },
+    {
+      "epoch": 0.9207853757616791,
+      "grad_norm": 0.7998213171958923,
+      "learning_rate": 0.000143937125748503,
+      "loss": 0.7647,
+      "step": 850
+    },
+    {
+      "epoch": 0.9316181448882871,
+      "grad_norm": 0.6916050910949707,
+      "learning_rate": 0.00014318862275449103,
+      "loss": 0.7697,
+      "step": 860
+    },
+    {
+      "epoch": 0.942450914014895,
+      "grad_norm": 1.0154324769973755,
+      "learning_rate": 0.00014244011976047904,
+      "loss": 0.7314,
+      "step": 870
+    },
+    {
+      "epoch": 0.953283683141503,
+      "grad_norm": 0.9787517786026001,
+      "learning_rate": 0.00014169161676646708,
+      "loss": 0.8047,
+      "step": 880
+    },
+    {
+      "epoch": 0.964116452268111,
+      "grad_norm": 0.6035457253456116,
+      "learning_rate": 0.00014094311377245512,
+      "loss": 0.783,
+      "step": 890
+    },
+    {
+      "epoch": 0.9749492213947191,
+      "grad_norm": 0.940951943397522,
+      "learning_rate": 0.0001401946107784431,
+      "loss": 0.7741,
+      "step": 900
+    },
+    {
+      "epoch": 0.985781990521327,
+      "grad_norm": 0.7785654067993164,
+      "learning_rate": 0.00013944610778443114,
+      "loss": 0.7855,
+      "step": 910
+    },
+    {
+      "epoch": 0.996614759647935,
+      "grad_norm": 0.8356137275695801,
+      "learning_rate": 0.00013869760479041918,
+      "loss": 0.8292,
+      "step": 920
+    },
+    {
+      "epoch": 1.0064996614759647,
+      "grad_norm": 0.6590499877929688,
+      "learning_rate": 0.0001379491017964072,
+      "loss": 0.6858,
+      "step": 930
+    },
+    {
+      "epoch": 1.0173324306025728,
+      "grad_norm": 1.0389671325683594,
+      "learning_rate": 0.00013720059880239523,
+      "loss": 0.6097,
+      "step": 940
+    },
+    {
+      "epoch": 1.0281651997291807,
+      "grad_norm": 0.9596243500709534,
+      "learning_rate": 0.00013645209580838324,
+      "loss": 0.5676,
+      "step": 950
+    },
+    {
+      "epoch": 1.0389979688557887,
+      "grad_norm": 1.0831798315048218,
+      "learning_rate": 0.00013570359281437125,
+      "loss": 0.6106,
+      "step": 960
+    },
+    {
+      "epoch": 1.0498307379823968,
+      "grad_norm": 0.92978835105896,
+      "learning_rate": 0.0001349550898203593,
+      "loss": 0.5924,
+      "step": 970
+    },
+    {
+      "epoch": 1.0606635071090047,
+      "grad_norm": 0.9672062993049622,
+      "learning_rate": 0.0001342065868263473,
+      "loss": 0.5496,
+      "step": 980
+    },
+    {
+      "epoch": 1.0714962762356128,
+      "grad_norm": 1.1402652263641357,
+      "learning_rate": 0.00013345808383233534,
+      "loss": 0.5871,
+      "step": 990
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "grad_norm": 1.1109035015106201,
+      "learning_rate": 0.00013270958083832335,
+      "loss": 0.5424,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "eval_loss": 0.8179630041122437,
+      "eval_runtime": 357.2769,
+      "eval_samples_per_second": 4.596,
+      "eval_steps_per_second": 2.298,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0931618144888287,
+      "grad_norm": 0.8117087483406067,
+      "learning_rate": 0.00013196107784431137,
+      "loss": 0.5636,
+      "step": 1010
+    },
+    {
+      "epoch": 1.1039945836154368,
+      "grad_norm": 0.86320561170578,
+      "learning_rate": 0.0001312125748502994,
+      "loss": 0.5191,
+      "step": 1020
+    },
+    {
+      "epoch": 1.1148273527420447,
+      "grad_norm": 1.1274133920669556,
+      "learning_rate": 0.00013046407185628744,
+      "loss": 0.5891,
+      "step": 1030
+    },
+    {
+      "epoch": 1.1256601218686526,
+      "grad_norm": 1.0116336345672607,
+      "learning_rate": 0.00012971556886227546,
+      "loss": 0.5579,
+      "step": 1040
+    },
+    {
+      "epoch": 1.1364928909952607,
+      "grad_norm": 0.9277855157852173,
+      "learning_rate": 0.0001289670658682635,
+      "loss": 0.5971,
+      "step": 1050
+    },
+    {
+      "epoch": 1.1473256601218687,
+      "grad_norm": 1.0700503587722778,
+      "learning_rate": 0.0001282185628742515,
+      "loss": 0.5815,
+      "step": 1060
+    },
+    {
+      "epoch": 1.1581584292484766,
+      "grad_norm": 0.9346574544906616,
+      "learning_rate": 0.00012747005988023952,
+      "loss": 0.5472,
+      "step": 1070
+    },
+    {
+      "epoch": 1.1689911983750847,
+      "grad_norm": 1.047631025314331,
+      "learning_rate": 0.00012672155688622756,
+      "loss": 0.5479,
+      "step": 1080
+    },
+    {
+      "epoch": 1.1798239675016926,
+      "grad_norm": 0.9931487441062927,
+      "learning_rate": 0.00012597305389221557,
+      "loss": 0.5521,
+      "step": 1090
+    },
+    {
+      "epoch": 1.1906567366283005,
+      "grad_norm": 0.9764857292175293,
+      "learning_rate": 0.0001252245508982036,
+      "loss": 0.584,
+      "step": 1100
+    },
+    {
+      "epoch": 1.2014895057549086,
+      "grad_norm": 1.0661903619766235,
+      "learning_rate": 0.00012447604790419162,
+      "loss": 0.6101,
+      "step": 1110
+    },
+    {
+      "epoch": 1.2123222748815166,
+      "grad_norm": 1.0962295532226562,
+      "learning_rate": 0.00012372754491017963,
+      "loss": 0.6028,
+      "step": 1120
+    },
+    {
+      "epoch": 1.2231550440081245,
+      "grad_norm": 0.9794766306877136,
+      "learning_rate": 0.00012297904191616767,
+      "loss": 0.5813,
+      "step": 1130
+    },
+    {
+      "epoch": 1.2339878131347326,
+      "grad_norm": 0.9556275606155396,
+      "learning_rate": 0.0001222305389221557,
+      "loss": 0.5662,
+      "step": 1140
+    },
+    {
+      "epoch": 1.2448205822613405,
+      "grad_norm": 1.1200224161148071,
+      "learning_rate": 0.0001214820359281437,
+      "loss": 0.5642,
+      "step": 1150
+    },
+    {
+      "epoch": 1.2556533513879486,
+      "grad_norm": 1.0518434047698975,
+      "learning_rate": 0.00012073353293413175,
+      "loss": 0.6126,
+      "step": 1160
+    },
+    {
+      "epoch": 1.2664861205145566,
+      "grad_norm": 1.1709963083267212,
+      "learning_rate": 0.00011998502994011977,
+      "loss": 0.5189,
+      "step": 1170
+    },
+    {
+      "epoch": 1.2773188896411645,
+      "grad_norm": 0.8867760896682739,
+      "learning_rate": 0.00011923652694610778,
+      "loss": 0.6098,
+      "step": 1180
+    },
+    {
+      "epoch": 1.2881516587677724,
+      "grad_norm": 0.9317127466201782,
+      "learning_rate": 0.00011848802395209582,
+      "loss": 0.5667,
+      "step": 1190
+    },
+    {
+      "epoch": 1.2989844278943805,
+      "grad_norm": 1.1382100582122803,
+      "learning_rate": 0.00011773952095808385,
+      "loss": 0.5756,
+      "step": 1200
+    },
+    {
+      "epoch": 1.3098171970209884,
+      "grad_norm": 0.9819681644439697,
+      "learning_rate": 0.00011699101796407186,
+      "loss": 0.5922,
+      "step": 1210
+    },
+    {
+      "epoch": 1.3206499661475966,
+      "grad_norm": 1.0776174068450928,
+      "learning_rate": 0.00011624251497005988,
+      "loss": 0.5728,
+      "step": 1220
+    },
+    {
+      "epoch": 1.3314827352742045,
+      "grad_norm": 1.0137302875518799,
+      "learning_rate": 0.0001154940119760479,
+      "loss": 0.5603,
+      "step": 1230
+    },
+    {
+      "epoch": 1.3423155044008124,
+      "grad_norm": 1.1223585605621338,
+      "learning_rate": 0.00011474550898203593,
+      "loss": 0.5639,
+      "step": 1240
+    },
+    {
+      "epoch": 1.3531482735274205,
+      "grad_norm": 0.8942229747772217,
+      "learning_rate": 0.00011399700598802396,
+      "loss": 0.586,
+      "step": 1250
+    },
+    {
+      "epoch": 1.3639810426540284,
+      "grad_norm": 1.225698709487915,
+      "learning_rate": 0.00011324850299401197,
+      "loss": 0.563,
+      "step": 1260
+    },
+    {
+      "epoch": 1.3748138117806366,
+      "grad_norm": 1.159463882446289,
+      "learning_rate": 0.00011250000000000001,
+      "loss": 0.5898,
+      "step": 1270
+    },
+    {
+      "epoch": 1.3856465809072445,
+      "grad_norm": 1.0059807300567627,
+      "learning_rate": 0.00011175149700598804,
+      "loss": 0.6096,
+      "step": 1280
+    },
+    {
+      "epoch": 1.3964793500338524,
+      "grad_norm": 1.1433062553405762,
+      "learning_rate": 0.00011100299401197605,
+      "loss": 0.5411,
+      "step": 1290
+    },
+    {
+      "epoch": 1.4073121191604603,
+      "grad_norm": 1.0282905101776123,
+      "learning_rate": 0.00011025449101796407,
+      "loss": 0.5928,
+      "step": 1300
+    },
+    {
+      "epoch": 1.4181448882870684,
+      "grad_norm": 0.8389853835105896,
+      "learning_rate": 0.00010950598802395211,
+      "loss": 0.5657,
+      "step": 1310
+    },
+    {
+      "epoch": 1.4289776574136763,
+      "grad_norm": 1.132350206375122,
+      "learning_rate": 0.00010875748502994012,
+      "loss": 0.6196,
+      "step": 1320
+    },
+    {
+      "epoch": 1.4398104265402845,
+      "grad_norm": 1.1093621253967285,
+      "learning_rate": 0.00010800898203592815,
+      "loss": 0.5845,
+      "step": 1330
+    },
+    {
+      "epoch": 1.4506431956668924,
+      "grad_norm": 1.3198816776275635,
+      "learning_rate": 0.00010726047904191616,
+      "loss": 0.5711,
+      "step": 1340
+    },
+    {
+      "epoch": 1.4614759647935003,
+      "grad_norm": 0.8968690037727356,
+      "learning_rate": 0.0001065119760479042,
+      "loss": 0.6075,
+      "step": 1350
+    },
+    {
+      "epoch": 1.4723087339201082,
+      "grad_norm": 1.0248963832855225,
+      "learning_rate": 0.00010576347305389222,
+      "loss": 0.5869,
+      "step": 1360
+    },
+    {
+      "epoch": 1.4831415030467163,
+      "grad_norm": 1.2115412950515747,
+      "learning_rate": 0.00010501497005988024,
+      "loss": 0.549,
+      "step": 1370
+    },
+    {
+      "epoch": 1.4939742721733242,
+      "grad_norm": 1.1320476531982422,
+      "learning_rate": 0.00010426646706586826,
+      "loss": 0.5661,
+      "step": 1380
+    },
+    {
+      "epoch": 1.5048070412999324,
+      "grad_norm": 1.0099844932556152,
+      "learning_rate": 0.0001035179640718563,
+      "loss": 0.5953,
+      "step": 1390
+    },
+    {
+      "epoch": 1.5156398104265403,
+      "grad_norm": 0.9809553623199463,
+      "learning_rate": 0.00010276946107784431,
+      "loss": 0.578,
+      "step": 1400
+    },
+    {
+      "epoch": 1.5264725795531482,
+      "grad_norm": 1.4169446229934692,
+      "learning_rate": 0.00010202095808383234,
+      "loss": 0.6173,
+      "step": 1410
+    },
+    {
+      "epoch": 1.537305348679756,
+      "grad_norm": 1.1033852100372314,
+      "learning_rate": 0.00010127245508982038,
+      "loss": 0.5917,
+      "step": 1420
+    },
+    {
+      "epoch": 1.5481381178063642,
+      "grad_norm": 1.1163372993469238,
+      "learning_rate": 0.00010052395209580839,
+      "loss": 0.589,
+      "step": 1430
+    },
+    {
+      "epoch": 1.5589708869329724,
+      "grad_norm": 0.9786676168441772,
+      "learning_rate": 9.977544910179641e-05,
+      "loss": 0.5425,
+      "step": 1440
+    },
+    {
+      "epoch": 1.5698036560595803,
+      "grad_norm": 1.034001111984253,
+      "learning_rate": 9.902694610778444e-05,
+      "loss": 0.5467,
+      "step": 1450
+    },
+    {
+      "epoch": 1.5806364251861882,
+      "grad_norm": 0.8697665929794312,
+      "learning_rate": 9.827844311377245e-05,
+      "loss": 0.5882,
+      "step": 1460
+    },
+    {
+      "epoch": 1.591469194312796,
+      "grad_norm": 1.0091935396194458,
+      "learning_rate": 9.752994011976049e-05,
+      "loss": 0.573,
+      "step": 1470
+    },
+    {
+      "epoch": 1.6023019634394042,
+      "grad_norm": 1.0126501321792603,
+      "learning_rate": 9.678143712574852e-05,
+      "loss": 0.6083,
+      "step": 1480
+    },
+    {
+      "epoch": 1.6131347325660121,
+      "grad_norm": 0.9271785020828247,
+      "learning_rate": 9.603293413173653e-05,
+      "loss": 0.5564,
+      "step": 1490
+    },
+    {
+      "epoch": 1.6239675016926203,
+      "grad_norm": 1.0736253261566162,
+      "learning_rate": 9.528443113772455e-05,
+      "loss": 0.5696,
+      "step": 1500
+    },
+    {
+      "epoch": 1.6239675016926203,
+      "eval_loss": 0.7986094355583191,
+      "eval_runtime": 358.2761,
+      "eval_samples_per_second": 4.583,
+      "eval_steps_per_second": 2.292,
+      "step": 1500
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2772,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.280941991033569e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
+size 5304

checkpoint-2000/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: BioMistral/BioMistral-7B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:BioMistral/BioMistral-7B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

checkpoint-2000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "BioMistral/BioMistral-7B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "k_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-2000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd05c5221728172b32903a08c1f23e14e7940ee3d5966867b6bdf6832ca1577a
+size 167832240

checkpoint-2000/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

checkpoint-2000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fe8da5c95eebc12a0c3cc50e53428ecb43730ad14835546d055560c72f55bc2
+size 335922386

checkpoint-2000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24310892b0a3280ab0672041351f54ecc2135b28fd61730f51cebbc6be2c0466
+size 14244

checkpoint-2000/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2534f71902c9d04ae6347b67f8675e467c1b2ff5627e9c11581ea7479caaba7c
+size 988

checkpoint-2000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:41b7aede3a1bc21b4c97c5d108f5c1a0971999e3de6f79f45bc105eb7c419b8f
+size 1064

checkpoint-2000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-2000/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2000/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

checkpoint-2000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-2000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1466 @@

+{
+  "best_global_step": 1500,
+  "best_metric": 0.7986094355583191,
+  "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1500",
+  "epoch": 2.1646580907244415,
+  "eval_steps": 500,
+  "global_step": 2000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.010832769126607989,
+      "grad_norm": 0.7727395296096802,
+      "learning_rate": 1.8e-05,
+      "loss": 0.889,
+      "step": 10
+    },
+    {
+      "epoch": 0.021665538253215978,
+      "grad_norm": 0.8008129596710205,
+      "learning_rate": 3.8e-05,
+      "loss": 0.8378,
+      "step": 20
+    },
+    {
+      "epoch": 0.03249830737982397,
+      "grad_norm": 0.9147247076034546,
+      "learning_rate": 5.8e-05,
+      "loss": 0.8108,
+      "step": 30
+    },
+    {
+      "epoch": 0.043331076506431955,
+      "grad_norm": 0.8121607303619385,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.8597,
+      "step": 40
+    },
+    {
+      "epoch": 0.05416384563303995,
+      "grad_norm": 1.0018593072891235,
+      "learning_rate": 9.8e-05,
+      "loss": 0.7486,
+      "step": 50
+    },
+    {
+      "epoch": 0.06499661475964794,
+      "grad_norm": 1.2048218250274658,
+      "learning_rate": 0.000118,
+      "loss": 0.6825,
+      "step": 60
+    },
+    {
+      "epoch": 0.07582938388625593,
+      "grad_norm": 0.9863468408584595,
+      "learning_rate": 0.000138,
+      "loss": 0.6539,
+      "step": 70
+    },
+    {
+      "epoch": 0.08666215301286391,
+      "grad_norm": 1.2911494970321655,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 0.6198,
+      "step": 80
+    },
+    {
+      "epoch": 0.0974949221394719,
+      "grad_norm": 1.159672737121582,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 0.6222,
+      "step": 90
+    },
+    {
+      "epoch": 0.1083276912660799,
+      "grad_norm": 1.0924432277679443,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 0.5923,
+      "step": 100
+    },
+    {
+      "epoch": 0.11916046039268788,
+      "grad_norm": 1.3423463106155396,
+      "learning_rate": 0.00019932634730538925,
+      "loss": 0.5548,
+      "step": 110
+    },
+    {
+      "epoch": 0.12999322951929587,
+      "grad_norm": 1.4929102659225464,
+      "learning_rate": 0.00019857784431137723,
+      "loss": 0.6701,
+      "step": 120
+    },
+    {
+      "epoch": 0.14082599864590387,
+      "grad_norm": 0.9462954998016357,
+      "learning_rate": 0.00019782934131736527,
+      "loss": 0.8675,
+      "step": 130
+    },
+    {
+      "epoch": 0.15165876777251186,
+      "grad_norm": 0.9912289977073669,
+      "learning_rate": 0.0001970808383233533,
+      "loss": 0.9074,
+      "step": 140
+    },
+    {
+      "epoch": 0.16249153689911983,
+      "grad_norm": 1.1070538759231567,
+      "learning_rate": 0.00019633233532934132,
+      "loss": 0.8755,
+      "step": 150
+    },
+    {
+      "epoch": 0.17332430602572782,
+      "grad_norm": 0.9465340375900269,
+      "learning_rate": 0.00019558383233532936,
+      "loss": 0.882,
+      "step": 160
+    },
+    {
+      "epoch": 0.18415707515233581,
+      "grad_norm": 0.8657329678535461,
+      "learning_rate": 0.00019483532934131737,
+      "loss": 0.8737,
+      "step": 170
+    },
+    {
+      "epoch": 0.1949898442789438,
+      "grad_norm": 0.7293577790260315,
+      "learning_rate": 0.0001940868263473054,
+      "loss": 0.8473,
+      "step": 180
+    },
+    {
+      "epoch": 0.2058226134055518,
+      "grad_norm": 0.849353551864624,
+      "learning_rate": 0.00019333832335329343,
+      "loss": 0.9414,
+      "step": 190
+    },
+    {
+      "epoch": 0.2166553825321598,
+      "grad_norm": 0.7525314688682556,
+      "learning_rate": 0.00019258982035928144,
+      "loss": 0.8852,
+      "step": 200
+    },
+    {
+      "epoch": 0.22748815165876776,
+      "grad_norm": 1.0732208490371704,
+      "learning_rate": 0.00019184131736526948,
+      "loss": 0.8074,
+      "step": 210
+    },
+    {
+      "epoch": 0.23832092078537576,
+      "grad_norm": 0.8420374393463135,
+      "learning_rate": 0.0001910928143712575,
+      "loss": 0.9508,
+      "step": 220
+    },
+    {
+      "epoch": 0.24915368991198375,
+      "grad_norm": 0.8308244347572327,
+      "learning_rate": 0.0001903443113772455,
+      "loss": 0.8734,
+      "step": 230
+    },
+    {
+      "epoch": 0.25998645903859174,
+      "grad_norm": 0.9915153384208679,
+      "learning_rate": 0.00018959580838323354,
+      "loss": 0.8816,
+      "step": 240
+    },
+    {
+      "epoch": 0.2708192281651997,
+      "grad_norm": 4.8621978759765625,
+      "learning_rate": 0.00018884730538922158,
+      "loss": 0.8848,
+      "step": 250
+    },
+    {
+      "epoch": 0.28165199729180773,
+      "grad_norm": 0.7945590019226074,
+      "learning_rate": 0.0001880988023952096,
+      "loss": 0.8503,
+      "step": 260
+    },
+    {
+      "epoch": 0.2924847664184157,
+      "grad_norm": 0.7896672487258911,
+      "learning_rate": 0.00018735029940119763,
+      "loss": 0.8798,
+      "step": 270
+    },
+    {
+      "epoch": 0.3033175355450237,
+      "grad_norm": 0.8870701789855957,
+      "learning_rate": 0.00018660179640718564,
+      "loss": 0.9112,
+      "step": 280
+    },
+    {
+      "epoch": 0.3141503046716317,
+      "grad_norm": 0.9003740549087524,
+      "learning_rate": 0.00018585329341317365,
+      "loss": 0.846,
+      "step": 290
+    },
+    {
+      "epoch": 0.32498307379823965,
+      "grad_norm": 0.7067676186561584,
+      "learning_rate": 0.0001851047904191617,
+      "loss": 0.8588,
+      "step": 300
+    },
+    {
+      "epoch": 0.3358158429248477,
+      "grad_norm": 0.9696246385574341,
+      "learning_rate": 0.0001843562874251497,
+      "loss": 0.8244,
+      "step": 310
+    },
+    {
+      "epoch": 0.34664861205145564,
+      "grad_norm": 0.9892609715461731,
+      "learning_rate": 0.00018360778443113774,
+      "loss": 0.8214,
+      "step": 320
+    },
+    {
+      "epoch": 0.35748138117806366,
+      "grad_norm": 0.822260856628418,
+      "learning_rate": 0.00018285928143712575,
+      "loss": 0.7977,
+      "step": 330
+    },
+    {
+      "epoch": 0.36831415030467163,
+      "grad_norm": 0.7743964791297913,
+      "learning_rate": 0.00018211077844311376,
+      "loss": 0.8002,
+      "step": 340
+    },
+    {
+      "epoch": 0.3791469194312796,
+      "grad_norm": 0.7090775370597839,
+      "learning_rate": 0.0001813622754491018,
+      "loss": 0.8192,
+      "step": 350
+    },
+    {
+      "epoch": 0.3899796885578876,
+      "grad_norm": 1.0970802307128906,
+      "learning_rate": 0.00018061377245508984,
+      "loss": 0.8516,
+      "step": 360
+    },
+    {
+      "epoch": 0.4008124576844956,
+      "grad_norm": 0.9633163213729858,
+      "learning_rate": 0.00017986526946107785,
+      "loss": 0.8414,
+      "step": 370
+    },
+    {
+      "epoch": 0.4116452268111036,
+      "grad_norm": 0.6846926808357239,
+      "learning_rate": 0.00017911676646706587,
+      "loss": 0.8187,
+      "step": 380
+    },
+    {
+      "epoch": 0.42247799593771157,
+      "grad_norm": 0.7262110710144043,
+      "learning_rate": 0.0001783682634730539,
+      "loss": 0.8572,
+      "step": 390
+    },
+    {
+      "epoch": 0.4333107650643196,
+      "grad_norm": 0.8537372350692749,
+      "learning_rate": 0.00017761976047904192,
+      "loss": 0.8286,
+      "step": 400
+    },
+    {
+      "epoch": 0.44414353419092756,
+      "grad_norm": 0.8860271573066711,
+      "learning_rate": 0.00017687125748502996,
+      "loss": 0.8416,
+      "step": 410
+    },
+    {
+      "epoch": 0.4549763033175355,
+      "grad_norm": 0.7984218597412109,
+      "learning_rate": 0.000176122754491018,
+      "loss": 0.8373,
+      "step": 420
+    },
+    {
+      "epoch": 0.46580907244414355,
+      "grad_norm": 0.8060943484306335,
+      "learning_rate": 0.000175374251497006,
+      "loss": 0.9165,
+      "step": 430
+    },
+    {
+      "epoch": 0.4766418415707515,
+      "grad_norm": 0.7871391177177429,
+      "learning_rate": 0.00017462574850299402,
+      "loss": 0.8276,
+      "step": 440
+    },
+    {
+      "epoch": 0.48747461069735953,
+      "grad_norm": 0.7732688784599304,
+      "learning_rate": 0.00017387724550898203,
+      "loss": 0.8346,
+      "step": 450
+    },
+    {
+      "epoch": 0.4983073798239675,
+      "grad_norm": 0.9314000606536865,
+      "learning_rate": 0.00017312874251497007,
+      "loss": 0.8291,
+      "step": 460
+    },
+    {
+      "epoch": 0.5091401489505755,
+      "grad_norm": 0.6721988916397095,
+      "learning_rate": 0.0001723802395209581,
+      "loss": 0.7091,
+      "step": 470
+    },
+    {
+      "epoch": 0.5199729180771835,
+      "grad_norm": 0.825965940952301,
+      "learning_rate": 0.00017163173652694612,
+      "loss": 0.8934,
+      "step": 480
+    },
+    {
+      "epoch": 0.5308056872037915,
+      "grad_norm": 0.8427668213844299,
+      "learning_rate": 0.00017088323353293413,
+      "loss": 0.7603,
+      "step": 490
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "grad_norm": 1.0061259269714355,
+      "learning_rate": 0.00017013473053892217,
+      "loss": 0.8277,
+      "step": 500
+    },
+    {
+      "epoch": 0.5416384563303994,
+      "eval_loss": 0.8331602811813354,
+      "eval_runtime": 355.9061,
+      "eval_samples_per_second": 4.614,
+      "eval_steps_per_second": 2.307,
+      "step": 500
+    },
+    {
+      "epoch": 0.5524712254570074,
+      "grad_norm": 0.8820628523826599,
+      "learning_rate": 0.00016938622754491018,
+      "loss": 0.8348,
+      "step": 510
+    },
+    {
+      "epoch": 0.5633039945836155,
+      "grad_norm": 0.8095284700393677,
+      "learning_rate": 0.00016863772455089822,
+      "loss": 0.9172,
+      "step": 520
+    },
+    {
+      "epoch": 0.5741367637102234,
+      "grad_norm": 0.6959540843963623,
+      "learning_rate": 0.00016788922155688623,
+      "loss": 0.838,
+      "step": 530
+    },
+    {
+      "epoch": 0.5849695328368314,
+      "grad_norm": 0.835831880569458,
+      "learning_rate": 0.00016714071856287424,
+      "loss": 0.8887,
+      "step": 540
+    },
+    {
+      "epoch": 0.5958023019634394,
+      "grad_norm": 0.9289611577987671,
+      "learning_rate": 0.00016639221556886228,
+      "loss": 0.8514,
+      "step": 550
+    },
+    {
+      "epoch": 0.6066350710900474,
+      "grad_norm": 0.6904628872871399,
+      "learning_rate": 0.00016564371257485032,
+      "loss": 0.8645,
+      "step": 560
+    },
+    {
+      "epoch": 0.6174678402166554,
+      "grad_norm": 0.8879178762435913,
+      "learning_rate": 0.00016489520958083833,
+      "loss": 0.8201,
+      "step": 570
+    },
+    {
+      "epoch": 0.6283006093432634,
+      "grad_norm": 0.8411425948143005,
+      "learning_rate": 0.00016414670658682637,
+      "loss": 0.836,
+      "step": 580
+    },
+    {
+      "epoch": 0.6391333784698714,
+      "grad_norm": 0.8564555644989014,
+      "learning_rate": 0.00016339820359281436,
+      "loss": 0.7724,
+      "step": 590
+    },
+    {
+      "epoch": 0.6499661475964793,
+      "grad_norm": 0.8382830619812012,
+      "learning_rate": 0.0001626497005988024,
+      "loss": 0.7839,
+      "step": 600
+    },
+    {
+      "epoch": 0.6607989167230873,
+      "grad_norm": 0.7657437920570374,
+      "learning_rate": 0.00016190119760479043,
+      "loss": 0.7973,
+      "step": 610
+    },
+    {
+      "epoch": 0.6716316858496953,
+      "grad_norm": 0.7758445143699646,
+      "learning_rate": 0.00016115269461077845,
+      "loss": 0.8111,
+      "step": 620
+    },
+    {
+      "epoch": 0.6824644549763034,
+      "grad_norm": 1.0041533708572388,
+      "learning_rate": 0.00016040419161676649,
+      "loss": 0.8359,
+      "step": 630
+    },
+    {
+      "epoch": 0.6932972241029113,
+      "grad_norm": 0.9679577946662903,
+      "learning_rate": 0.0001596556886227545,
+      "loss": 0.8822,
+      "step": 640
+    },
+    {
+      "epoch": 0.7041299932295193,
+      "grad_norm": 0.8141391277313232,
+      "learning_rate": 0.0001589071856287425,
+      "loss": 0.8714,
+      "step": 650
+    },
+    {
+      "epoch": 0.7149627623561273,
+      "grad_norm": 0.7982810139656067,
+      "learning_rate": 0.00015815868263473055,
+      "loss": 0.856,
+      "step": 660
+    },
+    {
+      "epoch": 0.7257955314827352,
+      "grad_norm": 0.7932000160217285,
+      "learning_rate": 0.00015741017964071859,
+      "loss": 0.8405,
+      "step": 670
+    },
+    {
+      "epoch": 0.7366283006093433,
+      "grad_norm": 0.7269508242607117,
+      "learning_rate": 0.0001566616766467066,
+      "loss": 0.8371,
+      "step": 680
+    },
+    {
+      "epoch": 0.7474610697359513,
+      "grad_norm": 0.9001722931861877,
+      "learning_rate": 0.0001559131736526946,
+      "loss": 0.8305,
+      "step": 690
+    },
+    {
+      "epoch": 0.7582938388625592,
+      "grad_norm": 0.6795508861541748,
+      "learning_rate": 0.00015516467065868262,
+      "loss": 0.8324,
+      "step": 700
+    },
+    {
+      "epoch": 0.7691266079891672,
+      "grad_norm": 0.8868729472160339,
+      "learning_rate": 0.00015441616766467066,
+      "loss": 0.8521,
+      "step": 710
+    },
+    {
+      "epoch": 0.7799593771157752,
+      "grad_norm": 0.9720478653907776,
+      "learning_rate": 0.0001536676646706587,
+      "loss": 0.7759,
+      "step": 720
+    },
+    {
+      "epoch": 0.7907921462423833,
+      "grad_norm": 0.8006075620651245,
+      "learning_rate": 0.0001529191616766467,
+      "loss": 0.7981,
+      "step": 730
+    },
+    {
+      "epoch": 0.8016249153689912,
+      "grad_norm": 0.9107721447944641,
+      "learning_rate": 0.00015217065868263475,
+      "loss": 0.7868,
+      "step": 740
+    },
+    {
+      "epoch": 0.8124576844955992,
+      "grad_norm": 0.7584466338157654,
+      "learning_rate": 0.00015142215568862276,
+      "loss": 0.7401,
+      "step": 750
+    },
+    {
+      "epoch": 0.8232904536222072,
+      "grad_norm": 1.0075221061706543,
+      "learning_rate": 0.00015067365269461077,
+      "loss": 0.8024,
+      "step": 760
+    },
+    {
+      "epoch": 0.8341232227488151,
+      "grad_norm": 0.8769344091415405,
+      "learning_rate": 0.0001499251497005988,
+      "loss": 0.7779,
+      "step": 770
+    },
+    {
+      "epoch": 0.8449559918754231,
+      "grad_norm": 0.84312903881073,
+      "learning_rate": 0.00014917664670658685,
+      "loss": 0.8314,
+      "step": 780
+    },
+    {
+      "epoch": 0.8557887610020312,
+      "grad_norm": 0.8116353750228882,
+      "learning_rate": 0.00014842814371257486,
+      "loss": 0.8146,
+      "step": 790
+    },
+    {
+      "epoch": 0.8666215301286392,
+      "grad_norm": 0.8301011919975281,
+      "learning_rate": 0.00014767964071856287,
+      "loss": 0.7422,
+      "step": 800
+    },
+    {
+      "epoch": 0.8774542992552471,
+      "grad_norm": 0.8579692244529724,
+      "learning_rate": 0.00014693113772455091,
+      "loss": 0.7442,
+      "step": 810
+    },
+    {
+      "epoch": 0.8882870683818551,
+      "grad_norm": 0.7513943910598755,
+      "learning_rate": 0.00014618263473053893,
+      "loss": 0.7671,
+      "step": 820
+    },
+    {
+      "epoch": 0.8991198375084631,
+      "grad_norm": 0.9639107584953308,
+      "learning_rate": 0.00014543413173652696,
+      "loss": 0.7896,
+      "step": 830
+    },
+    {
+      "epoch": 0.909952606635071,
+      "grad_norm": 0.8897636532783508,
+      "learning_rate": 0.00014468562874251498,
+      "loss": 0.7613,
+      "step": 840
+    },
+    {
+      "epoch": 0.9207853757616791,
+      "grad_norm": 0.7998213171958923,
+      "learning_rate": 0.000143937125748503,
+      "loss": 0.7647,
+      "step": 850
+    },
+    {
+      "epoch": 0.9316181448882871,
+      "grad_norm": 0.6916050910949707,
+      "learning_rate": 0.00014318862275449103,
+      "loss": 0.7697,
+      "step": 860
+    },
+    {
+      "epoch": 0.942450914014895,
+      "grad_norm": 1.0154324769973755,
+      "learning_rate": 0.00014244011976047904,
+      "loss": 0.7314,
+      "step": 870
+    },
+    {
+      "epoch": 0.953283683141503,
+      "grad_norm": 0.9787517786026001,
+      "learning_rate": 0.00014169161676646708,
+      "loss": 0.8047,
+      "step": 880
+    },
+    {
+      "epoch": 0.964116452268111,
+      "grad_norm": 0.6035457253456116,
+      "learning_rate": 0.00014094311377245512,
+      "loss": 0.783,
+      "step": 890
+    },
+    {
+      "epoch": 0.9749492213947191,
+      "grad_norm": 0.940951943397522,
+      "learning_rate": 0.0001401946107784431,
+      "loss": 0.7741,
+      "step": 900
+    },
+    {
+      "epoch": 0.985781990521327,
+      "grad_norm": 0.7785654067993164,
+      "learning_rate": 0.00013944610778443114,
+      "loss": 0.7855,
+      "step": 910
+    },
+    {
+      "epoch": 0.996614759647935,
+      "grad_norm": 0.8356137275695801,
+      "learning_rate": 0.00013869760479041918,
+      "loss": 0.8292,
+      "step": 920
+    },
+    {
+      "epoch": 1.0064996614759647,
+      "grad_norm": 0.6590499877929688,
+      "learning_rate": 0.0001379491017964072,
+      "loss": 0.6858,
+      "step": 930
+    },
+    {
+      "epoch": 1.0173324306025728,
+      "grad_norm": 1.0389671325683594,
+      "learning_rate": 0.00013720059880239523,
+      "loss": 0.6097,
+      "step": 940
+    },
+    {
+      "epoch": 1.0281651997291807,
+      "grad_norm": 0.9596243500709534,
+      "learning_rate": 0.00013645209580838324,
+      "loss": 0.5676,
+      "step": 950
+    },
+    {
+      "epoch": 1.0389979688557887,
+      "grad_norm": 1.0831798315048218,
+      "learning_rate": 0.00013570359281437125,
+      "loss": 0.6106,
+      "step": 960
+    },
+    {
+      "epoch": 1.0498307379823968,
+      "grad_norm": 0.92978835105896,
+      "learning_rate": 0.0001349550898203593,
+      "loss": 0.5924,
+      "step": 970
+    },
+    {
+      "epoch": 1.0606635071090047,
+      "grad_norm": 0.9672062993049622,
+      "learning_rate": 0.0001342065868263473,
+      "loss": 0.5496,
+      "step": 980
+    },
+    {
+      "epoch": 1.0714962762356128,
+      "grad_norm": 1.1402652263641357,
+      "learning_rate": 0.00013345808383233534,
+      "loss": 0.5871,
+      "step": 990
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "grad_norm": 1.1109035015106201,
+      "learning_rate": 0.00013270958083832335,
+      "loss": 0.5424,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0823290453622207,
+      "eval_loss": 0.8179630041122437,
+      "eval_runtime": 357.2769,
+      "eval_samples_per_second": 4.596,
+      "eval_steps_per_second": 2.298,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0931618144888287,
+      "grad_norm": 0.8117087483406067,
+      "learning_rate": 0.00013196107784431137,
+      "loss": 0.5636,
+      "step": 1010
+    },
+    {
+      "epoch": 1.1039945836154368,
+      "grad_norm": 0.86320561170578,
+      "learning_rate": 0.0001312125748502994,
+      "loss": 0.5191,
+      "step": 1020
+    },
+    {
+      "epoch": 1.1148273527420447,
+      "grad_norm": 1.1274133920669556,
+      "learning_rate": 0.00013046407185628744,
+      "loss": 0.5891,
+      "step": 1030
+    },
+    {
+      "epoch": 1.1256601218686526,
+      "grad_norm": 1.0116336345672607,
+      "learning_rate": 0.00012971556886227546,
+      "loss": 0.5579,
+      "step": 1040
+    },
+    {
+      "epoch": 1.1364928909952607,
+      "grad_norm": 0.9277855157852173,
+      "learning_rate": 0.0001289670658682635,
+      "loss": 0.5971,
+      "step": 1050
+    },
+    {
+      "epoch": 1.1473256601218687,
+      "grad_norm": 1.0700503587722778,
+      "learning_rate": 0.0001282185628742515,
+      "loss": 0.5815,
+      "step": 1060
+    },
+    {
+      "epoch": 1.1581584292484766,
+      "grad_norm": 0.9346574544906616,
+      "learning_rate": 0.00012747005988023952,
+      "loss": 0.5472,
+      "step": 1070
+    },
+    {
+      "epoch": 1.1689911983750847,
+      "grad_norm": 1.047631025314331,
+      "learning_rate": 0.00012672155688622756,
+      "loss": 0.5479,
+      "step": 1080
+    },
+    {
+      "epoch": 1.1798239675016926,
+      "grad_norm": 0.9931487441062927,
+      "learning_rate": 0.00012597305389221557,
+      "loss": 0.5521,
+      "step": 1090
+    },
+    {
+      "epoch": 1.1906567366283005,
+      "grad_norm": 0.9764857292175293,
+      "learning_rate": 0.0001252245508982036,
+      "loss": 0.584,
+      "step": 1100
+    },
+    {
+      "epoch": 1.2014895057549086,
+      "grad_norm": 1.0661903619766235,
+      "learning_rate": 0.00012447604790419162,
+      "loss": 0.6101,
+      "step": 1110
+    },
+    {
+      "epoch": 1.2123222748815166,
+      "grad_norm": 1.0962295532226562,
+      "learning_rate": 0.00012372754491017963,
+      "loss": 0.6028,
+      "step": 1120
+    },
+    {
+      "epoch": 1.2231550440081245,
+      "grad_norm": 0.9794766306877136,
+      "learning_rate": 0.00012297904191616767,
+      "loss": 0.5813,
+      "step": 1130
+    },
+    {
+      "epoch": 1.2339878131347326,
+      "grad_norm": 0.9556275606155396,
+      "learning_rate": 0.0001222305389221557,
+      "loss": 0.5662,
+      "step": 1140
+    },
+    {
+      "epoch": 1.2448205822613405,
+      "grad_norm": 1.1200224161148071,
+      "learning_rate": 0.0001214820359281437,
+      "loss": 0.5642,
+      "step": 1150
+    },
+    {
+      "epoch": 1.2556533513879486,
+      "grad_norm": 1.0518434047698975,
+      "learning_rate": 0.00012073353293413175,
+      "loss": 0.6126,
+      "step": 1160
+    },
+    {
+      "epoch": 1.2664861205145566,
+      "grad_norm": 1.1709963083267212,
+      "learning_rate": 0.00011998502994011977,
+      "loss": 0.5189,
+      "step": 1170
+    },
+    {
+      "epoch": 1.2773188896411645,
+      "grad_norm": 0.8867760896682739,
+      "learning_rate": 0.00011923652694610778,
+      "loss": 0.6098,
+      "step": 1180
+    },
+    {
+      "epoch": 1.2881516587677724,
+      "grad_norm": 0.9317127466201782,
+      "learning_rate": 0.00011848802395209582,
+      "loss": 0.5667,
+      "step": 1190
+    },
+    {
+      "epoch": 1.2989844278943805,
+      "grad_norm": 1.1382100582122803,
+      "learning_rate": 0.00011773952095808385,
+      "loss": 0.5756,
+      "step": 1200
+    },
+    {
+      "epoch": 1.3098171970209884,
+      "grad_norm": 0.9819681644439697,
+      "learning_rate": 0.00011699101796407186,
+      "loss": 0.5922,
+      "step": 1210
+    },
+    {
+      "epoch": 1.3206499661475966,
+      "grad_norm": 1.0776174068450928,
+      "learning_rate": 0.00011624251497005988,
+      "loss": 0.5728,
+      "step": 1220
+    },
+    {
+      "epoch": 1.3314827352742045,
+      "grad_norm": 1.0137302875518799,
+      "learning_rate": 0.0001154940119760479,
+      "loss": 0.5603,
+      "step": 1230
+    },
+    {
+      "epoch": 1.3423155044008124,
+      "grad_norm": 1.1223585605621338,
+      "learning_rate": 0.00011474550898203593,
+      "loss": 0.5639,
+      "step": 1240
+    },
+    {
+      "epoch": 1.3531482735274205,
+      "grad_norm": 0.8942229747772217,
+      "learning_rate": 0.00011399700598802396,
+      "loss": 0.586,
+      "step": 1250
+    },
+    {
+      "epoch": 1.3639810426540284,
+      "grad_norm": 1.225698709487915,
+      "learning_rate": 0.00011324850299401197,
+      "loss": 0.563,
+      "step": 1260
+    },
+    {
+      "epoch": 1.3748138117806366,
+      "grad_norm": 1.159463882446289,
+      "learning_rate": 0.00011250000000000001,
+      "loss": 0.5898,
+      "step": 1270
+    },
+    {
+      "epoch": 1.3856465809072445,
+      "grad_norm": 1.0059807300567627,
+      "learning_rate": 0.00011175149700598804,
+      "loss": 0.6096,
+      "step": 1280
+    },
+    {
+      "epoch": 1.3964793500338524,
+      "grad_norm": 1.1433062553405762,
+      "learning_rate": 0.00011100299401197605,
+      "loss": 0.5411,
+      "step": 1290
+    },
+    {
+      "epoch": 1.4073121191604603,
+      "grad_norm": 1.0282905101776123,
+      "learning_rate": 0.00011025449101796407,
+      "loss": 0.5928,
+      "step": 1300
+    },
+    {
+      "epoch": 1.4181448882870684,
+      "grad_norm": 0.8389853835105896,
+      "learning_rate": 0.00010950598802395211,
+      "loss": 0.5657,
+      "step": 1310
+    },
+    {
+      "epoch": 1.4289776574136763,
+      "grad_norm": 1.132350206375122,
+      "learning_rate": 0.00010875748502994012,
+      "loss": 0.6196,
+      "step": 1320
+    },
+    {
+      "epoch": 1.4398104265402845,
+      "grad_norm": 1.1093621253967285,
+      "learning_rate": 0.00010800898203592815,
+      "loss": 0.5845,
+      "step": 1330
+    },
+    {
+      "epoch": 1.4506431956668924,
+      "grad_norm": 1.3198816776275635,
+      "learning_rate": 0.00010726047904191616,
+      "loss": 0.5711,
+      "step": 1340
+    },
+    {
+      "epoch": 1.4614759647935003,
+      "grad_norm": 0.8968690037727356,
+      "learning_rate": 0.0001065119760479042,
+      "loss": 0.6075,
+      "step": 1350
+    },
+    {
+      "epoch": 1.4723087339201082,
+      "grad_norm": 1.0248963832855225,
+      "learning_rate": 0.00010576347305389222,
+      "loss": 0.5869,
+      "step": 1360
+    },
+    {
+      "epoch": 1.4831415030467163,
+      "grad_norm": 1.2115412950515747,
+      "learning_rate": 0.00010501497005988024,
+      "loss": 0.549,
+      "step": 1370
+    },
+    {
+      "epoch": 1.4939742721733242,
+      "grad_norm": 1.1320476531982422,
+      "learning_rate": 0.00010426646706586826,
+      "loss": 0.5661,
+      "step": 1380
+    },
+    {
+      "epoch": 1.5048070412999324,
+      "grad_norm": 1.0099844932556152,
+      "learning_rate": 0.0001035179640718563,
+      "loss": 0.5953,
+      "step": 1390
+    },
+    {
+      "epoch": 1.5156398104265403,
+      "grad_norm": 0.9809553623199463,
+      "learning_rate": 0.00010276946107784431,
+      "loss": 0.578,
+      "step": 1400
+    },
+    {
+      "epoch": 1.5264725795531482,
+      "grad_norm": 1.4169446229934692,
+      "learning_rate": 0.00010202095808383234,
+      "loss": 0.6173,
+      "step": 1410
+    },
+    {
+      "epoch": 1.537305348679756,
+      "grad_norm": 1.1033852100372314,
+      "learning_rate": 0.00010127245508982038,
+      "loss": 0.5917,
+      "step": 1420
+    },
+    {
+      "epoch": 1.5481381178063642,
+      "grad_norm": 1.1163372993469238,
+      "learning_rate": 0.00010052395209580839,
+      "loss": 0.589,
+      "step": 1430
+    },
+    {
+      "epoch": 1.5589708869329724,
+      "grad_norm": 0.9786676168441772,
+      "learning_rate": 9.977544910179641e-05,
+      "loss": 0.5425,
+      "step": 1440
+    },
+    {
+      "epoch": 1.5698036560595803,
+      "grad_norm": 1.034001111984253,
+      "learning_rate": 9.902694610778444e-05,
+      "loss": 0.5467,
+      "step": 1450
+    },
+    {
+      "epoch": 1.5806364251861882,
+      "grad_norm": 0.8697665929794312,
+      "learning_rate": 9.827844311377245e-05,
+      "loss": 0.5882,
+      "step": 1460
+    },
+    {
+      "epoch": 1.591469194312796,
+      "grad_norm": 1.0091935396194458,
+      "learning_rate": 9.752994011976049e-05,
+      "loss": 0.573,
+      "step": 1470
+    },
+    {
+      "epoch": 1.6023019634394042,
+      "grad_norm": 1.0126501321792603,
+      "learning_rate": 9.678143712574852e-05,
+      "loss": 0.6083,
+      "step": 1480
+    },
+    {
+      "epoch": 1.6131347325660121,
+      "grad_norm": 0.9271785020828247,
+      "learning_rate": 9.603293413173653e-05,
+      "loss": 0.5564,
+      "step": 1490
+    },
+    {
+      "epoch": 1.6239675016926203,
+      "grad_norm": 1.0736253261566162,
+      "learning_rate": 9.528443113772455e-05,
+      "loss": 0.5696,
+      "step": 1500
+    },
+    {
+      "epoch": 1.6239675016926203,
+      "eval_loss": 0.7986094355583191,
+      "eval_runtime": 358.2761,
+      "eval_samples_per_second": 4.583,
+      "eval_steps_per_second": 2.292,
+      "step": 1500
+    },
+    {
+      "epoch": 1.6348002708192282,
+      "grad_norm": 0.9671568870544434,
+      "learning_rate": 9.453592814371258e-05,
+      "loss": 0.5994,
+      "step": 1510
+    },
+    {
+      "epoch": 1.645633039945836,
+      "grad_norm": 0.9636701345443726,
+      "learning_rate": 9.37874251497006e-05,
+      "loss": 0.6096,
+      "step": 1520
+    },
+    {
+      "epoch": 1.656465809072444,
+      "grad_norm": 1.1323844194412231,
+      "learning_rate": 9.303892215568863e-05,
+      "loss": 0.5981,
+      "step": 1530
+    },
+    {
+      "epoch": 1.6672985781990521,
+      "grad_norm": 1.0002387762069702,
+      "learning_rate": 9.229041916167665e-05,
+      "loss": 0.5807,
+      "step": 1540
+    },
+    {
+      "epoch": 1.6781313473256603,
+      "grad_norm": 1.2000038623809814,
+      "learning_rate": 9.154191616766468e-05,
+      "loss": 0.5583,
+      "step": 1550
+    },
+    {
+      "epoch": 1.6889641164522682,
+      "grad_norm": 1.153903841972351,
+      "learning_rate": 9.079341317365269e-05,
+      "loss": 0.6237,
+      "step": 1560
+    },
+    {
+      "epoch": 1.699796885578876,
+      "grad_norm": 1.0791847705841064,
+      "learning_rate": 9.004491017964072e-05,
+      "loss": 0.5457,
+      "step": 1570
+    },
+    {
+      "epoch": 1.710629654705484,
+      "grad_norm": 1.1212618350982666,
+      "learning_rate": 8.929640718562875e-05,
+      "loss": 0.551,
+      "step": 1580
+    },
+    {
+      "epoch": 1.721462423832092,
+      "grad_norm": 1.219691514968872,
+      "learning_rate": 8.854790419161677e-05,
+      "loss": 0.6027,
+      "step": 1590
+    },
+    {
+      "epoch": 1.7322951929587,
+      "grad_norm": 1.066247820854187,
+      "learning_rate": 8.779940119760479e-05,
+      "loss": 0.5739,
+      "step": 1600
+    },
+    {
+      "epoch": 1.7431279620853082,
+      "grad_norm": 1.070609450340271,
+      "learning_rate": 8.705089820359282e-05,
+      "loss": 0.6113,
+      "step": 1610
+    },
+    {
+      "epoch": 1.753960731211916,
+      "grad_norm": 1.377456784248352,
+      "learning_rate": 8.630239520958084e-05,
+      "loss": 0.5648,
+      "step": 1620
+    },
+    {
+      "epoch": 1.764793500338524,
+      "grad_norm": 1.0471181869506836,
+      "learning_rate": 8.555389221556887e-05,
+      "loss": 0.5514,
+      "step": 1630
+    },
+    {
+      "epoch": 1.775626269465132,
+      "grad_norm": 1.2327128648757935,
+      "learning_rate": 8.480538922155688e-05,
+      "loss": 0.6072,
+      "step": 1640
+    },
+    {
+      "epoch": 1.78645903859174,
+      "grad_norm": 1.004497766494751,
+      "learning_rate": 8.405688622754492e-05,
+      "loss": 0.5601,
+      "step": 1650
+    },
+    {
+      "epoch": 1.797291807718348,
+      "grad_norm": 1.2862775325775146,
+      "learning_rate": 8.330838323353294e-05,
+      "loss": 0.6073,
+      "step": 1660
+    },
+    {
+      "epoch": 1.808124576844956,
+      "grad_norm": 1.0752897262573242,
+      "learning_rate": 8.255988023952096e-05,
+      "loss": 0.6079,
+      "step": 1670
+    },
+    {
+      "epoch": 1.818957345971564,
+      "grad_norm": 1.031568169593811,
+      "learning_rate": 8.1811377245509e-05,
+      "loss": 0.5701,
+      "step": 1680
+    },
+    {
+      "epoch": 1.829790115098172,
+      "grad_norm": 1.2067883014678955,
+      "learning_rate": 8.1062874251497e-05,
+      "loss": 0.6024,
+      "step": 1690
+    },
+    {
+      "epoch": 1.8406228842247798,
+      "grad_norm": 1.2873584032058716,
+      "learning_rate": 8.031437125748503e-05,
+      "loss": 0.6033,
+      "step": 1700
+    },
+    {
+      "epoch": 1.851455653351388,
+      "grad_norm": 1.1230562925338745,
+      "learning_rate": 7.956586826347306e-05,
+      "loss": 0.5534,
+      "step": 1710
+    },
+    {
+      "epoch": 1.862288422477996,
+      "grad_norm": 1.275429129600525,
+      "learning_rate": 7.881736526946108e-05,
+      "loss": 0.5483,
+      "step": 1720
+    },
+    {
+      "epoch": 1.873121191604604,
+      "grad_norm": 1.1561681032180786,
+      "learning_rate": 7.806886227544911e-05,
+      "loss": 0.5948,
+      "step": 1730
+    },
+    {
+      "epoch": 1.883953960731212,
+      "grad_norm": 1.0285365581512451,
+      "learning_rate": 7.732035928143713e-05,
+      "loss": 0.5843,
+      "step": 1740
+    },
+    {
+      "epoch": 1.8947867298578198,
+      "grad_norm": 1.257944107055664,
+      "learning_rate": 7.657185628742516e-05,
+      "loss": 0.5672,
+      "step": 1750
+    },
+    {
+      "epoch": 1.9056194989844277,
+      "grad_norm": 1.2069061994552612,
+      "learning_rate": 7.582335329341318e-05,
+      "loss": 0.6312,
+      "step": 1760
+    },
+    {
+      "epoch": 1.9164522681110359,
+      "grad_norm": 0.946007251739502,
+      "learning_rate": 7.50748502994012e-05,
+      "loss": 0.6028,
+      "step": 1770
+    },
+    {
+      "epoch": 1.927285037237644,
+      "grad_norm": 1.3141242265701294,
+      "learning_rate": 7.432634730538922e-05,
+      "loss": 0.5762,
+      "step": 1780
+    },
+    {
+      "epoch": 1.938117806364252,
+      "grad_norm": 0.9737468957901001,
+      "learning_rate": 7.357784431137726e-05,
+      "loss": 0.5637,
+      "step": 1790
+    },
+    {
+      "epoch": 1.9489505754908598,
+      "grad_norm": 1.0719372034072876,
+      "learning_rate": 7.282934131736527e-05,
+      "loss": 0.5685,
+      "step": 1800
+    },
+    {
+      "epoch": 1.9597833446174677,
+      "grad_norm": 0.9777527451515198,
+      "learning_rate": 7.20808383233533e-05,
+      "loss": 0.5823,
+      "step": 1810
+    },
+    {
+      "epoch": 1.9706161137440759,
+      "grad_norm": 1.019610047340393,
+      "learning_rate": 7.133233532934132e-05,
+      "loss": 0.5342,
+      "step": 1820
+    },
+    {
+      "epoch": 1.9814488828706838,
+      "grad_norm": 1.2895872592926025,
+      "learning_rate": 7.058383233532935e-05,
+      "loss": 0.5625,
+      "step": 1830
+    },
+    {
+      "epoch": 1.992281651997292,
+      "grad_norm": 1.1473089456558228,
+      "learning_rate": 6.983532934131737e-05,
+      "loss": 0.5476,
+      "step": 1840
+    },
+    {
+      "epoch": 2.0021665538253215,
+      "grad_norm": 0.9660665392875671,
+      "learning_rate": 6.908682634730538e-05,
+      "loss": 0.5755,
+      "step": 1850
+    },
+    {
+      "epoch": 2.0129993229519294,
+      "grad_norm": 1.2142918109893799,
+      "learning_rate": 6.833832335329342e-05,
+      "loss": 0.3712,
+      "step": 1860
+    },
+    {
+      "epoch": 2.0238320920785378,
+      "grad_norm": 1.4106266498565674,
+      "learning_rate": 6.758982035928145e-05,
+      "loss": 0.3645,
+      "step": 1870
+    },
+    {
+      "epoch": 2.0346648612051457,
+      "grad_norm": 1.2526434659957886,
+      "learning_rate": 6.684131736526946e-05,
+      "loss": 0.3634,
+      "step": 1880
+    },
+    {
+      "epoch": 2.0454976303317536,
+      "grad_norm": 1.2345237731933594,
+      "learning_rate": 6.609281437125749e-05,
+      "loss": 0.3181,
+      "step": 1890
+    },
+    {
+      "epoch": 2.0563303994583615,
+      "grad_norm": 1.1664937734603882,
+      "learning_rate": 6.534431137724551e-05,
+      "loss": 0.3412,
+      "step": 1900
+    },
+    {
+      "epoch": 2.0671631685849694,
+      "grad_norm": 1.3861303329467773,
+      "learning_rate": 6.459580838323354e-05,
+      "loss": 0.348,
+      "step": 1910
+    },
+    {
+      "epoch": 2.0779959377115773,
+      "grad_norm": 1.4952672719955444,
+      "learning_rate": 6.384730538922156e-05,
+      "loss": 0.3193,
+      "step": 1920
+    },
+    {
+      "epoch": 2.0888287068381857,
+      "grad_norm": 1.3477551937103271,
+      "learning_rate": 6.309880239520959e-05,
+      "loss": 0.3499,
+      "step": 1930
+    },
+    {
+      "epoch": 2.0996614759647936,
+      "grad_norm": 1.5403518676757812,
+      "learning_rate": 6.235029940119761e-05,
+      "loss": 0.3742,
+      "step": 1940
+    },
+    {
+      "epoch": 2.1104942450914015,
+      "grad_norm": 1.2402806282043457,
+      "learning_rate": 6.160179640718562e-05,
+      "loss": 0.3575,
+      "step": 1950
+    },
+    {
+      "epoch": 2.1213270142180094,
+      "grad_norm": 1.0961482524871826,
+      "learning_rate": 6.085329341317365e-05,
+      "loss": 0.355,
+      "step": 1960
+    },
+    {
+      "epoch": 2.1321597833446173,
+      "grad_norm": 1.1491034030914307,
+      "learning_rate": 6.010479041916168e-05,
+      "loss": 0.3555,
+      "step": 1970
+    },
+    {
+      "epoch": 2.1429925524712257,
+      "grad_norm": 1.6308982372283936,
+      "learning_rate": 5.9356287425149706e-05,
+      "loss": 0.3408,
+      "step": 1980
+    },
+    {
+      "epoch": 2.1538253215978336,
+      "grad_norm": 1.481628656387329,
+      "learning_rate": 5.8607784431137725e-05,
+      "loss": 0.3814,
+      "step": 1990
+    },
+    {
+      "epoch": 2.1646580907244415,
+      "grad_norm": 1.2989139556884766,
+      "learning_rate": 5.785928143712576e-05,
+      "loss": 0.343,
+      "step": 2000
+    },
+    {
+      "epoch": 2.1646580907244415,
+      "eval_loss": 0.8451017141342163,
+      "eval_runtime": 357.77,
+      "eval_samples_per_second": 4.59,
+      "eval_steps_per_second": 2.295,
+      "step": 2000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2772,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.035689546180198e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
+size 5304

checkpoint-2500/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: BioMistral/BioMistral-7B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:BioMistral/BioMistral-7B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

checkpoint-2500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "BioMistral/BioMistral-7B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "k_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-2500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf6789607635a0fefb945b61a36bfed3dbf3bf73010588ae8a722b6f9b06b73d
+size 167832240

checkpoint-2500/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@