ayuwal12 commited on
Commit
6690520
·
verified ·
1 Parent(s): 9a76225

Upload LoRA fine-tuned BioMistral-7B model

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +276 -0
  2. adapter_config.json +41 -0
  3. adapter_model.safetensors +3 -0
  4. chat_template.jinja +1 -0
  5. checkpoint-1000/README.md +207 -0
  6. checkpoint-1000/adapter_config.json +41 -0
  7. checkpoint-1000/adapter_model.safetensors +3 -0
  8. checkpoint-1000/chat_template.jinja +1 -0
  9. checkpoint-1000/optimizer.pt +3 -0
  10. checkpoint-1000/rng_state.pth +3 -0
  11. checkpoint-1000/scaler.pt +3 -0
  12. checkpoint-1000/scheduler.pt +3 -0
  13. checkpoint-1000/special_tokens_map.json +24 -0
  14. checkpoint-1000/tokenizer.json +0 -0
  15. checkpoint-1000/tokenizer.model +3 -0
  16. checkpoint-1000/tokenizer_config.json +44 -0
  17. checkpoint-1000/trainer_state.json +750 -0
  18. checkpoint-1000/training_args.bin +3 -0
  19. checkpoint-1500/README.md +207 -0
  20. checkpoint-1500/adapter_config.json +41 -0
  21. checkpoint-1500/adapter_model.safetensors +3 -0
  22. checkpoint-1500/chat_template.jinja +1 -0
  23. checkpoint-1500/optimizer.pt +3 -0
  24. checkpoint-1500/rng_state.pth +3 -0
  25. checkpoint-1500/scaler.pt +3 -0
  26. checkpoint-1500/scheduler.pt +3 -0
  27. checkpoint-1500/special_tokens_map.json +24 -0
  28. checkpoint-1500/tokenizer.json +0 -0
  29. checkpoint-1500/tokenizer.model +3 -0
  30. checkpoint-1500/tokenizer_config.json +44 -0
  31. checkpoint-1500/trainer_state.json +1108 -0
  32. checkpoint-1500/training_args.bin +3 -0
  33. checkpoint-2000/README.md +207 -0
  34. checkpoint-2000/adapter_config.json +41 -0
  35. checkpoint-2000/adapter_model.safetensors +3 -0
  36. checkpoint-2000/chat_template.jinja +1 -0
  37. checkpoint-2000/optimizer.pt +3 -0
  38. checkpoint-2000/rng_state.pth +3 -0
  39. checkpoint-2000/scaler.pt +3 -0
  40. checkpoint-2000/scheduler.pt +3 -0
  41. checkpoint-2000/special_tokens_map.json +24 -0
  42. checkpoint-2000/tokenizer.json +0 -0
  43. checkpoint-2000/tokenizer.model +3 -0
  44. checkpoint-2000/tokenizer_config.json +44 -0
  45. checkpoint-2000/trainer_state.json +1466 -0
  46. checkpoint-2000/training_args.bin +3 -0
  47. checkpoint-2500/README.md +207 -0
  48. checkpoint-2500/adapter_config.json +41 -0
  49. checkpoint-2500/adapter_model.safetensors +3 -0
  50. checkpoint-2500/chat_template.jinja +1 -0
README.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # BioMistral-7B LoRA Fine-tuned on MedQuAD
3
+
4
+ This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B) for medical question answering, trained on the MedQuAD dataset from Kaggle.
5
+
6
+ ## Model Description
7
+
8
+ - **Base Model**: BioMistral/BioMistral-7B
9
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
10
+ - **Model Type**: Causal Language Model
11
+ - **Training Dataset**: MedQuAD (Medical Question Answering Dataset)
12
+ - **Domain**: Medical/Biomedical
13
+ - **Language**: English
14
+ - **License**: Apache 2.0 (inherited from base model)
15
+
16
+ ## Dataset Information
17
+
18
+ ### MedQuAD Dataset
19
+ - **Source**: [MedQuAD on Kaggle](https://www.kaggle.com/datasets/jpmiller/medquad)
20
+ - **Full Name**: Medical Question Answering Dataset
21
+ - **Description**: A collection of medical questions and answers from trusted medical sources
22
+ - **Training Examples**: 14,770 question-answer pairs
23
+ - **Validation Examples**: 1,642 question-answer pairs
24
+ - **Format**: Instruction-Input-Output triplets for medical Q&A
25
+
26
+ ### Data Sources (MedQuAD)
27
+ The MedQuAD dataset contains medical information from various authoritative sources including:
28
+ - National Institutes of Health (NIH)
29
+ - National Cancer Institute (NCI)
30
+ - National Institute of Mental Health (NIMH)
31
+ - Centers for Disease Control and Prevention (CDC)
32
+ - And other trusted medical organizations
33
+
34
+ ## Training Details
35
+
36
+ ### Training Configuration
37
+ - **Training Steps**: 2,772 (3 epochs)
38
+ - **Batch Size**: 2 per device
39
+ - **Gradient Accumulation**: 8 steps
40
+ - **Effective Batch Size**: 16
41
+ - **Learning Rate**: 2e-4
42
+ - **Warmup Steps**: 100
43
+ - **Max Sequence Length**: 512
44
+ - **Optimizer**: AdamW
45
+ - **Precision**: FP16
46
+
47
+ ### LoRA Configuration
48
+ - **LoRA Rank (r)**: 16
49
+ - **LoRA Alpha**: 32
50
+ - **LoRA Dropout**: 0.1
51
+ - **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
52
+ - **Trainable Parameters**: ~0.1% of total parameters
53
+
54
+ ### Training Results
55
+ | Step | Training Loss | Validation Loss |
56
+ |------|---------------|-----------------|
57
+ | 500 | 0.8277 | 0.8332 |
58
+ | 1000 | 0.5424 | 0.8180 |
59
+ | 1500 | 0.5696 | 0.7986 |
60
+ | 2000 | 0.3430 | 0.8451 |
61
+ | 2500 | 0.3184 | 0.8488 |
62
+
63
+ **Final Validation Loss**: 0.8488
64
+
65
+ ## Installation
66
+
67
+ ```bash
68
+ pip install transformers peft torch accelerate bitsandbytes
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ ### Option 1: Using the Full Fine-tuned Model
74
+
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+ import torch
78
+
79
+ # Load the fine-tuned model
80
+ model_name = "ayuwal12/biomistral-7b-finetuned"
81
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ model_name,
84
+ device_map="auto",
85
+ torch_dtype=torch.float16
86
+ )
87
+
88
+ def generate_medical_response(question, context="", max_length=256):
89
+ # Format the prompt for medical Q&A
90
+ if context.strip():
91
+ prompt = f"### Instruction:\\n{question}\\n\\n### Input:\\n{context}\\n\\n### Response:\\n"
92
+ else:
93
+ prompt = f"### Instruction:\\n{question}\\n\\n### Response:\\n"
94
+
95
+ # Tokenize and generate
96
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
97
+
98
+ with torch.no_grad():
99
+ outputs = model.generate(
100
+ **inputs,
101
+ max_new_tokens=max_length,
102
+ temperature=0.7,
103
+ do_sample=True,
104
+ pad_token_id=tokenizer.eos_token_id,
105
+ eos_token_id=tokenizer.eos_token_id
106
+ )
107
+
108
+ # Decode and extract response
109
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
110
+ return response.split("### Response:\\n")[-1].strip()
111
+
112
+ # Example usage
113
+ response = generate_medical_response("What is diabetes and what are its main types?")
114
+ print(response)
115
+ ```
116
+
117
+ ### Option 2: Using LoRA Adapters (Recommended)
118
+
119
+ ```python
120
+ from transformers import AutoModelForCausalLM, AutoTokenizer
121
+ from peft import PeftModel
122
+ import torch
123
+
124
+ # Load base model
125
+ base_model_name = "BioMistral/BioMistral-7B"
126
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
127
+ base_model = AutoModelForCausalLM.from_pretrained(
128
+ base_model_name,
129
+ device_map="auto",
130
+ torch_dtype=torch.float16
131
+ )
132
+
133
+ # Load LoRA adapters
134
+ lora_model_name = "ayuwal12/biomistral-7b-lora-adapters"
135
+ model = PeftModel.from_pretrained(base_model, lora_model_name)
136
+
137
+ # Set pad token
138
+ if tokenizer.pad_token is None:
139
+ tokenizer.pad_token = tokenizer.eos_token
140
+
141
+ def generate_medical_response(question, context="", max_length=256):
142
+ if context.strip():
143
+ prompt = f"### Instruction:\\n{question}\\n\\n### Input:\\n{context}\\n\\n### Response:\\n"
144
+ else:
145
+ prompt = f"### Instruction:\\n{question}\\n\\n### Response:\\n"
146
+
147
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
148
+
149
+ with torch.no_grad():
150
+ outputs = model.generate(
151
+ **inputs,
152
+ max_new_tokens=max_length,
153
+ temperature=0.7,
154
+ do_sample=True,
155
+ pad_token_id=tokenizer.eos_token_id
156
+ )
157
+
158
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
159
+ return response.split("### Response:\\n")[-1].strip()
160
+
161
+ # Example usage
162
+ response = generate_medical_response("What are the symptoms of hypertension?")
163
+ print(response)
164
+ ```
165
+
166
+ ## Example Medical Questions
167
+
168
+ ### General Medical Questions
169
+ ```python
170
+ question = "What is hypertension and how is it diagnosed?"
171
+ response = generate_medical_response(question)
172
+ ```
173
+
174
+ ### Symptoms and Conditions
175
+ ```python
176
+ question = "What are the common symptoms of type 2 diabetes?"
177
+ response = generate_medical_response(question)
178
+ ```
179
+
180
+ ### Treatment and Management
181
+ ```python
182
+ question = "How is high blood pressure treated?"
183
+ response = generate_medical_response(question)
184
+ ```
185
+
186
+ ### With Medical Context
187
+ ```python
188
+ question = "What should I know about this condition?"
189
+ context = "Patient has been diagnosed with stage 1 hypertension"
190
+ response = generate_medical_response(question, context)
191
+ ```
192
+
193
+ ## Model Performance
194
+
195
+ - **Training Loss**: Decreased from 0.83 to 0.32 over 3 epochs
196
+ - **Validation Loss**: Stabilized around 0.85
197
+ - **Convergence**: Model shows good learning with minimal overfitting
198
+ - **Memory Efficiency**: Uses ~0.1% trainable parameters via LoRA
199
+ - **Domain**: Specialized for medical question answering
200
+
201
+ ## Capabilities
202
+
203
+ This model excels at:
204
+ - ✅ **Medical Question Answering**: Trained specifically on medical Q&A pairs
205
+ - ✅ **Disease Information**: Provides information about various medical conditions
206
+ - ✅ **Symptom Analysis**: Explains symptoms and their significance
207
+ - ✅ **Treatment Overview**: Discusses general treatment approaches
208
+ - ✅ **Medical Terminology**: Understands and explains medical terms
209
+
210
+ ## Limitations
211
+
212
+ - Based on BioMistral-7B, inherits its limitations
213
+ - Trained on MedQuAD dataset, may not cover all medical domains equally
214
+ - **Not for diagnosis**: Cannot replace professional medical evaluation
215
+ - **Information only**: Provides general medical information, not personalized advice
216
+ - May not have the most recent medical research (depends on training data cutoff)
217
+
218
+ ## Intended Use
219
+
220
+ This model is designed for:
221
+ - 📚 **Educational purposes** in medical and healthcare domains
222
+ - 🔬 **Research applications** in biomedical NLP
223
+ - 💡 **Medical information retrieval** systems
224
+ - 🏥 **Healthcare chatbots** (with appropriate disclaimers)
225
+ - 📖 **Medical knowledge base** applications
226
+
227
+ ## Ethical Considerations & Medical Disclaimer
228
+
229
+ ⚠️ **IMPORTANT MEDICAL DISCLAIMER**:
230
+ - This model is for **educational and research purposes only**
231
+ - **NOT for medical diagnosis** or treatment decisions
232
+ - Always consult qualified healthcare professionals for medical advice
233
+ - AI-generated medical content may contain errors or biases
234
+ - Do not use this model for emergency medical situations
235
+ - Individual medical conditions require personalized professional care
236
+
237
+ ## Dataset Citation
238
+
239
+ ```bibtex
240
+ @misc{medquad,
241
+ title={MedQuAD: Medical Question Answering Dataset},
242
+ author={Ben Abacha, Asma and Mrabet, Yassine and Zhang, Yuhao and Shivade, Chaitanya and Langlotz, Curtis and Demner-Fushman, Dina},
243
+ year={2019},
244
+ howpublished={Available on Kaggle: https://www.kaggle.com/datasets/jpmiller/medquad}
245
+ }
246
+ ```
247
+
248
+ ## Model Citation
249
+
250
+ If you use this model, please cite:
251
+
252
+ ```bibtex
253
+ @misc{biomistral-medquad-lora,
254
+ title={BioMistral-7B LoRA Fine-tuned on MedQuAD},
255
+ author={Ayuwal},
256
+ year={2024},
257
+ howpublished={https://huggingface.co/ayuwal12/biomistral-7b-finetuned},
258
+ }
259
+ ```
260
+
261
+ ## Acknowledgments
262
+
263
+ - **Base model**: [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)
264
+ - **Training dataset**: [MedQuAD](https://www.kaggle.com/datasets/jpmiller/medquad)
265
+ - **LoRA implementation**: [PEFT](https://github.com/huggingface/peft)
266
+ - **Training framework**: [Transformers](https://github.com/huggingface/transformers)
267
+ - **Original MedQuAD authors**: Ben Abacha et al.
268
+
269
+ ## Contact
270
+
271
+ For questions or issues, please open an issue on the model repository.
272
+
273
+ ---
274
+
275
+ *This model was trained on the MedQuAD dataset and is intended for educational and research purposes in the medical domain. Always consult healthcare professionals for medical advice.*
276
+
adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "BioMistral/BioMistral-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "o_proj",
34
+ "v_proj"
35
+ ],
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6acb6477671a04aa0dae759554a5d2784b51a1f041302953a830bf41dac335c0
3
+ size 167832240
chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}
checkpoint-1000/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BioMistral/BioMistral-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:BioMistral/BioMistral-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
checkpoint-1000/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "BioMistral/BioMistral-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "o_proj",
34
+ "v_proj"
35
+ ],
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
checkpoint-1000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d096215cd5a79308ef2f002f3ed29b12ebb25b4f3aa2b8600bafe263c05e2ec8
3
+ size 167832240
checkpoint-1000/chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}
checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c44d9952b18d7d82a941cb04693044a0be58b4d67b9bf87344262bda89b0d60
3
+ size 335922386
checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9c248bdefa931c4b8818ef14890f078eb74e00ffacb25c16b33beb19deb757d
3
+ size 14244
checkpoint-1000/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a25a54ef013052084cc1af4b9237b8bf9a919c4653e785c1b249c0020f99c494
3
+ size 988
checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9829c541fbe820d9473a51158fb1381e97abcf18a78e66b970ecc18cee00706a
3
+ size 1064
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-1000/tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": true,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }
checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,750 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.8179630041122437,
4
+ "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1000",
5
+ "epoch": 1.0823290453622207,
6
+ "eval_steps": 500,
7
+ "global_step": 1000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.010832769126607989,
14
+ "grad_norm": 0.7727395296096802,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 0.889,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.021665538253215978,
21
+ "grad_norm": 0.8008129596710205,
22
+ "learning_rate": 3.8e-05,
23
+ "loss": 0.8378,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.03249830737982397,
28
+ "grad_norm": 0.9147247076034546,
29
+ "learning_rate": 5.8e-05,
30
+ "loss": 0.8108,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.043331076506431955,
35
+ "grad_norm": 0.8121607303619385,
36
+ "learning_rate": 7.800000000000001e-05,
37
+ "loss": 0.8597,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.05416384563303995,
42
+ "grad_norm": 1.0018593072891235,
43
+ "learning_rate": 9.8e-05,
44
+ "loss": 0.7486,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.06499661475964794,
49
+ "grad_norm": 1.2048218250274658,
50
+ "learning_rate": 0.000118,
51
+ "loss": 0.6825,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.07582938388625593,
56
+ "grad_norm": 0.9863468408584595,
57
+ "learning_rate": 0.000138,
58
+ "loss": 0.6539,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.08666215301286391,
63
+ "grad_norm": 1.2911494970321655,
64
+ "learning_rate": 0.00015800000000000002,
65
+ "loss": 0.6198,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.0974949221394719,
70
+ "grad_norm": 1.159672737121582,
71
+ "learning_rate": 0.00017800000000000002,
72
+ "loss": 0.6222,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1083276912660799,
77
+ "grad_norm": 1.0924432277679443,
78
+ "learning_rate": 0.00019800000000000002,
79
+ "loss": 0.5923,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.11916046039268788,
84
+ "grad_norm": 1.3423463106155396,
85
+ "learning_rate": 0.00019932634730538925,
86
+ "loss": 0.5548,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.12999322951929587,
91
+ "grad_norm": 1.4929102659225464,
92
+ "learning_rate": 0.00019857784431137723,
93
+ "loss": 0.6701,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.14082599864590387,
98
+ "grad_norm": 0.9462954998016357,
99
+ "learning_rate": 0.00019782934131736527,
100
+ "loss": 0.8675,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.15165876777251186,
105
+ "grad_norm": 0.9912289977073669,
106
+ "learning_rate": 0.0001970808383233533,
107
+ "loss": 0.9074,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.16249153689911983,
112
+ "grad_norm": 1.1070538759231567,
113
+ "learning_rate": 0.00019633233532934132,
114
+ "loss": 0.8755,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.17332430602572782,
119
+ "grad_norm": 0.9465340375900269,
120
+ "learning_rate": 0.00019558383233532936,
121
+ "loss": 0.882,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.18415707515233581,
126
+ "grad_norm": 0.8657329678535461,
127
+ "learning_rate": 0.00019483532934131737,
128
+ "loss": 0.8737,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.1949898442789438,
133
+ "grad_norm": 0.7293577790260315,
134
+ "learning_rate": 0.0001940868263473054,
135
+ "loss": 0.8473,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.2058226134055518,
140
+ "grad_norm": 0.849353551864624,
141
+ "learning_rate": 0.00019333832335329343,
142
+ "loss": 0.9414,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2166553825321598,
147
+ "grad_norm": 0.7525314688682556,
148
+ "learning_rate": 0.00019258982035928144,
149
+ "loss": 0.8852,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.22748815165876776,
154
+ "grad_norm": 1.0732208490371704,
155
+ "learning_rate": 0.00019184131736526948,
156
+ "loss": 0.8074,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.23832092078537576,
161
+ "grad_norm": 0.8420374393463135,
162
+ "learning_rate": 0.0001910928143712575,
163
+ "loss": 0.9508,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.24915368991198375,
168
+ "grad_norm": 0.8308244347572327,
169
+ "learning_rate": 0.0001903443113772455,
170
+ "loss": 0.8734,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.25998645903859174,
175
+ "grad_norm": 0.9915153384208679,
176
+ "learning_rate": 0.00018959580838323354,
177
+ "loss": 0.8816,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.2708192281651997,
182
+ "grad_norm": 4.8621978759765625,
183
+ "learning_rate": 0.00018884730538922158,
184
+ "loss": 0.8848,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.28165199729180773,
189
+ "grad_norm": 0.7945590019226074,
190
+ "learning_rate": 0.0001880988023952096,
191
+ "loss": 0.8503,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.2924847664184157,
196
+ "grad_norm": 0.7896672487258911,
197
+ "learning_rate": 0.00018735029940119763,
198
+ "loss": 0.8798,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.3033175355450237,
203
+ "grad_norm": 0.8870701789855957,
204
+ "learning_rate": 0.00018660179640718564,
205
+ "loss": 0.9112,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.3141503046716317,
210
+ "grad_norm": 0.9003740549087524,
211
+ "learning_rate": 0.00018585329341317365,
212
+ "loss": 0.846,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.32498307379823965,
217
+ "grad_norm": 0.7067676186561584,
218
+ "learning_rate": 0.0001851047904191617,
219
+ "loss": 0.8588,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.3358158429248477,
224
+ "grad_norm": 0.9696246385574341,
225
+ "learning_rate": 0.0001843562874251497,
226
+ "loss": 0.8244,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.34664861205145564,
231
+ "grad_norm": 0.9892609715461731,
232
+ "learning_rate": 0.00018360778443113774,
233
+ "loss": 0.8214,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.35748138117806366,
238
+ "grad_norm": 0.822260856628418,
239
+ "learning_rate": 0.00018285928143712575,
240
+ "loss": 0.7977,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.36831415030467163,
245
+ "grad_norm": 0.7743964791297913,
246
+ "learning_rate": 0.00018211077844311376,
247
+ "loss": 0.8002,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.3791469194312796,
252
+ "grad_norm": 0.7090775370597839,
253
+ "learning_rate": 0.0001813622754491018,
254
+ "loss": 0.8192,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.3899796885578876,
259
+ "grad_norm": 1.0970802307128906,
260
+ "learning_rate": 0.00018061377245508984,
261
+ "loss": 0.8516,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.4008124576844956,
266
+ "grad_norm": 0.9633163213729858,
267
+ "learning_rate": 0.00017986526946107785,
268
+ "loss": 0.8414,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.4116452268111036,
273
+ "grad_norm": 0.6846926808357239,
274
+ "learning_rate": 0.00017911676646706587,
275
+ "loss": 0.8187,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.42247799593771157,
280
+ "grad_norm": 0.7262110710144043,
281
+ "learning_rate": 0.0001783682634730539,
282
+ "loss": 0.8572,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.4333107650643196,
287
+ "grad_norm": 0.8537372350692749,
288
+ "learning_rate": 0.00017761976047904192,
289
+ "loss": 0.8286,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.44414353419092756,
294
+ "grad_norm": 0.8860271573066711,
295
+ "learning_rate": 0.00017687125748502996,
296
+ "loss": 0.8416,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.4549763033175355,
301
+ "grad_norm": 0.7984218597412109,
302
+ "learning_rate": 0.000176122754491018,
303
+ "loss": 0.8373,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.46580907244414355,
308
+ "grad_norm": 0.8060943484306335,
309
+ "learning_rate": 0.000175374251497006,
310
+ "loss": 0.9165,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.4766418415707515,
315
+ "grad_norm": 0.7871391177177429,
316
+ "learning_rate": 0.00017462574850299402,
317
+ "loss": 0.8276,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.48747461069735953,
322
+ "grad_norm": 0.7732688784599304,
323
+ "learning_rate": 0.00017387724550898203,
324
+ "loss": 0.8346,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.4983073798239675,
329
+ "grad_norm": 0.9314000606536865,
330
+ "learning_rate": 0.00017312874251497007,
331
+ "loss": 0.8291,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5091401489505755,
336
+ "grad_norm": 0.6721988916397095,
337
+ "learning_rate": 0.0001723802395209581,
338
+ "loss": 0.7091,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.5199729180771835,
343
+ "grad_norm": 0.825965940952301,
344
+ "learning_rate": 0.00017163173652694612,
345
+ "loss": 0.8934,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.5308056872037915,
350
+ "grad_norm": 0.8427668213844299,
351
+ "learning_rate": 0.00017088323353293413,
352
+ "loss": 0.7603,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.5416384563303994,
357
+ "grad_norm": 1.0061259269714355,
358
+ "learning_rate": 0.00017013473053892217,
359
+ "loss": 0.8277,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.5416384563303994,
364
+ "eval_loss": 0.8331602811813354,
365
+ "eval_runtime": 355.9061,
366
+ "eval_samples_per_second": 4.614,
367
+ "eval_steps_per_second": 2.307,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.5524712254570074,
372
+ "grad_norm": 0.8820628523826599,
373
+ "learning_rate": 0.00016938622754491018,
374
+ "loss": 0.8348,
375
+ "step": 510
376
+ },
377
+ {
378
+ "epoch": 0.5633039945836155,
379
+ "grad_norm": 0.8095284700393677,
380
+ "learning_rate": 0.00016863772455089822,
381
+ "loss": 0.9172,
382
+ "step": 520
383
+ },
384
+ {
385
+ "epoch": 0.5741367637102234,
386
+ "grad_norm": 0.6959540843963623,
387
+ "learning_rate": 0.00016788922155688623,
388
+ "loss": 0.838,
389
+ "step": 530
390
+ },
391
+ {
392
+ "epoch": 0.5849695328368314,
393
+ "grad_norm": 0.835831880569458,
394
+ "learning_rate": 0.00016714071856287424,
395
+ "loss": 0.8887,
396
+ "step": 540
397
+ },
398
+ {
399
+ "epoch": 0.5958023019634394,
400
+ "grad_norm": 0.9289611577987671,
401
+ "learning_rate": 0.00016639221556886228,
402
+ "loss": 0.8514,
403
+ "step": 550
404
+ },
405
+ {
406
+ "epoch": 0.6066350710900474,
407
+ "grad_norm": 0.6904628872871399,
408
+ "learning_rate": 0.00016564371257485032,
409
+ "loss": 0.8645,
410
+ "step": 560
411
+ },
412
+ {
413
+ "epoch": 0.6174678402166554,
414
+ "grad_norm": 0.8879178762435913,
415
+ "learning_rate": 0.00016489520958083833,
416
+ "loss": 0.8201,
417
+ "step": 570
418
+ },
419
+ {
420
+ "epoch": 0.6283006093432634,
421
+ "grad_norm": 0.8411425948143005,
422
+ "learning_rate": 0.00016414670658682637,
423
+ "loss": 0.836,
424
+ "step": 580
425
+ },
426
+ {
427
+ "epoch": 0.6391333784698714,
428
+ "grad_norm": 0.8564555644989014,
429
+ "learning_rate": 0.00016339820359281436,
430
+ "loss": 0.7724,
431
+ "step": 590
432
+ },
433
+ {
434
+ "epoch": 0.6499661475964793,
435
+ "grad_norm": 0.8382830619812012,
436
+ "learning_rate": 0.0001626497005988024,
437
+ "loss": 0.7839,
438
+ "step": 600
439
+ },
440
+ {
441
+ "epoch": 0.6607989167230873,
442
+ "grad_norm": 0.7657437920570374,
443
+ "learning_rate": 0.00016190119760479043,
444
+ "loss": 0.7973,
445
+ "step": 610
446
+ },
447
+ {
448
+ "epoch": 0.6716316858496953,
449
+ "grad_norm": 0.7758445143699646,
450
+ "learning_rate": 0.00016115269461077845,
451
+ "loss": 0.8111,
452
+ "step": 620
453
+ },
454
+ {
455
+ "epoch": 0.6824644549763034,
456
+ "grad_norm": 1.0041533708572388,
457
+ "learning_rate": 0.00016040419161676649,
458
+ "loss": 0.8359,
459
+ "step": 630
460
+ },
461
+ {
462
+ "epoch": 0.6932972241029113,
463
+ "grad_norm": 0.9679577946662903,
464
+ "learning_rate": 0.0001596556886227545,
465
+ "loss": 0.8822,
466
+ "step": 640
467
+ },
468
+ {
469
+ "epoch": 0.7041299932295193,
470
+ "grad_norm": 0.8141391277313232,
471
+ "learning_rate": 0.0001589071856287425,
472
+ "loss": 0.8714,
473
+ "step": 650
474
+ },
475
+ {
476
+ "epoch": 0.7149627623561273,
477
+ "grad_norm": 0.7982810139656067,
478
+ "learning_rate": 0.00015815868263473055,
479
+ "loss": 0.856,
480
+ "step": 660
481
+ },
482
+ {
483
+ "epoch": 0.7257955314827352,
484
+ "grad_norm": 0.7932000160217285,
485
+ "learning_rate": 0.00015741017964071859,
486
+ "loss": 0.8405,
487
+ "step": 670
488
+ },
489
+ {
490
+ "epoch": 0.7366283006093433,
491
+ "grad_norm": 0.7269508242607117,
492
+ "learning_rate": 0.0001566616766467066,
493
+ "loss": 0.8371,
494
+ "step": 680
495
+ },
496
+ {
497
+ "epoch": 0.7474610697359513,
498
+ "grad_norm": 0.9001722931861877,
499
+ "learning_rate": 0.0001559131736526946,
500
+ "loss": 0.8305,
501
+ "step": 690
502
+ },
503
+ {
504
+ "epoch": 0.7582938388625592,
505
+ "grad_norm": 0.6795508861541748,
506
+ "learning_rate": 0.00015516467065868262,
507
+ "loss": 0.8324,
508
+ "step": 700
509
+ },
510
+ {
511
+ "epoch": 0.7691266079891672,
512
+ "grad_norm": 0.8868729472160339,
513
+ "learning_rate": 0.00015441616766467066,
514
+ "loss": 0.8521,
515
+ "step": 710
516
+ },
517
+ {
518
+ "epoch": 0.7799593771157752,
519
+ "grad_norm": 0.9720478653907776,
520
+ "learning_rate": 0.0001536676646706587,
521
+ "loss": 0.7759,
522
+ "step": 720
523
+ },
524
+ {
525
+ "epoch": 0.7907921462423833,
526
+ "grad_norm": 0.8006075620651245,
527
+ "learning_rate": 0.0001529191616766467,
528
+ "loss": 0.7981,
529
+ "step": 730
530
+ },
531
+ {
532
+ "epoch": 0.8016249153689912,
533
+ "grad_norm": 0.9107721447944641,
534
+ "learning_rate": 0.00015217065868263475,
535
+ "loss": 0.7868,
536
+ "step": 740
537
+ },
538
+ {
539
+ "epoch": 0.8124576844955992,
540
+ "grad_norm": 0.7584466338157654,
541
+ "learning_rate": 0.00015142215568862276,
542
+ "loss": 0.7401,
543
+ "step": 750
544
+ },
545
+ {
546
+ "epoch": 0.8232904536222072,
547
+ "grad_norm": 1.0075221061706543,
548
+ "learning_rate": 0.00015067365269461077,
549
+ "loss": 0.8024,
550
+ "step": 760
551
+ },
552
+ {
553
+ "epoch": 0.8341232227488151,
554
+ "grad_norm": 0.8769344091415405,
555
+ "learning_rate": 0.0001499251497005988,
556
+ "loss": 0.7779,
557
+ "step": 770
558
+ },
559
+ {
560
+ "epoch": 0.8449559918754231,
561
+ "grad_norm": 0.84312903881073,
562
+ "learning_rate": 0.00014917664670658685,
563
+ "loss": 0.8314,
564
+ "step": 780
565
+ },
566
+ {
567
+ "epoch": 0.8557887610020312,
568
+ "grad_norm": 0.8116353750228882,
569
+ "learning_rate": 0.00014842814371257486,
570
+ "loss": 0.8146,
571
+ "step": 790
572
+ },
573
+ {
574
+ "epoch": 0.8666215301286392,
575
+ "grad_norm": 0.8301011919975281,
576
+ "learning_rate": 0.00014767964071856287,
577
+ "loss": 0.7422,
578
+ "step": 800
579
+ },
580
+ {
581
+ "epoch": 0.8774542992552471,
582
+ "grad_norm": 0.8579692244529724,
583
+ "learning_rate": 0.00014693113772455091,
584
+ "loss": 0.7442,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 0.8882870683818551,
589
+ "grad_norm": 0.7513943910598755,
590
+ "learning_rate": 0.00014618263473053893,
591
+ "loss": 0.7671,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 0.8991198375084631,
596
+ "grad_norm": 0.9639107584953308,
597
+ "learning_rate": 0.00014543413173652696,
598
+ "loss": 0.7896,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 0.909952606635071,
603
+ "grad_norm": 0.8897636532783508,
604
+ "learning_rate": 0.00014468562874251498,
605
+ "loss": 0.7613,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 0.9207853757616791,
610
+ "grad_norm": 0.7998213171958923,
611
+ "learning_rate": 0.000143937125748503,
612
+ "loss": 0.7647,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 0.9316181448882871,
617
+ "grad_norm": 0.6916050910949707,
618
+ "learning_rate": 0.00014318862275449103,
619
+ "loss": 0.7697,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 0.942450914014895,
624
+ "grad_norm": 1.0154324769973755,
625
+ "learning_rate": 0.00014244011976047904,
626
+ "loss": 0.7314,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 0.953283683141503,
631
+ "grad_norm": 0.9787517786026001,
632
+ "learning_rate": 0.00014169161676646708,
633
+ "loss": 0.8047,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 0.964116452268111,
638
+ "grad_norm": 0.6035457253456116,
639
+ "learning_rate": 0.00014094311377245512,
640
+ "loss": 0.783,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 0.9749492213947191,
645
+ "grad_norm": 0.940951943397522,
646
+ "learning_rate": 0.0001401946107784431,
647
+ "loss": 0.7741,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 0.985781990521327,
652
+ "grad_norm": 0.7785654067993164,
653
+ "learning_rate": 0.00013944610778443114,
654
+ "loss": 0.7855,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 0.996614759647935,
659
+ "grad_norm": 0.8356137275695801,
660
+ "learning_rate": 0.00013869760479041918,
661
+ "loss": 0.8292,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.0064996614759647,
666
+ "grad_norm": 0.6590499877929688,
667
+ "learning_rate": 0.0001379491017964072,
668
+ "loss": 0.6858,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.0173324306025728,
673
+ "grad_norm": 1.0389671325683594,
674
+ "learning_rate": 0.00013720059880239523,
675
+ "loss": 0.6097,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.0281651997291807,
680
+ "grad_norm": 0.9596243500709534,
681
+ "learning_rate": 0.00013645209580838324,
682
+ "loss": 0.5676,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.0389979688557887,
687
+ "grad_norm": 1.0831798315048218,
688
+ "learning_rate": 0.00013570359281437125,
689
+ "loss": 0.6106,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.0498307379823968,
694
+ "grad_norm": 0.92978835105896,
695
+ "learning_rate": 0.0001349550898203593,
696
+ "loss": 0.5924,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.0606635071090047,
701
+ "grad_norm": 0.9672062993049622,
702
+ "learning_rate": 0.0001342065868263473,
703
+ "loss": 0.5496,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.0714962762356128,
708
+ "grad_norm": 1.1402652263641357,
709
+ "learning_rate": 0.00013345808383233534,
710
+ "loss": 0.5871,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.0823290453622207,
715
+ "grad_norm": 1.1109035015106201,
716
+ "learning_rate": 0.00013270958083832335,
717
+ "loss": 0.5424,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.0823290453622207,
722
+ "eval_loss": 0.8179630041122437,
723
+ "eval_runtime": 357.2769,
724
+ "eval_samples_per_second": 4.596,
725
+ "eval_steps_per_second": 2.298,
726
+ "step": 1000
727
+ }
728
+ ],
729
+ "logging_steps": 10,
730
+ "max_steps": 2772,
731
+ "num_input_tokens_seen": 0,
732
+ "num_train_epochs": 3,
733
+ "save_steps": 500,
734
+ "stateful_callbacks": {
735
+ "TrainerControl": {
736
+ "args": {
737
+ "should_epoch_stop": false,
738
+ "should_evaluate": false,
739
+ "should_log": false,
740
+ "should_save": true,
741
+ "should_training_stop": false
742
+ },
743
+ "attributes": {}
744
+ }
745
+ },
746
+ "total_flos": 3.523118244330209e+17,
747
+ "train_batch_size": 2,
748
+ "trial_name": null,
749
+ "trial_params": null
750
+ }
checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
3
+ size 5304
checkpoint-1500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BioMistral/BioMistral-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:BioMistral/BioMistral-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
checkpoint-1500/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "BioMistral/BioMistral-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "o_proj",
34
+ "v_proj"
35
+ ],
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
checkpoint-1500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6acb6477671a04aa0dae759554a5d2784b51a1f041302953a830bf41dac335c0
3
+ size 167832240
checkpoint-1500/chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}
checkpoint-1500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e24d1aeae71b0c686a0c00153e93e6c8332148b91f1a06464f5e7331284b5850
3
+ size 335922386
checkpoint-1500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6492fa1abb8928e85806aef548738bd43054b1594362687738367dfdf1836137
3
+ size 14244
checkpoint-1500/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54bb4f2ea251861747e8fc194eb844d57f95dac1c25d302b4ad59b349b681af6
3
+ size 988
checkpoint-1500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0455cd3e16cc5d63c9bdb4bcd02d9fd21bd515cbcda2087df9901523b6b81055
3
+ size 1064
checkpoint-1500/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1500/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1500/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-1500/tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": true,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }
checkpoint-1500/trainer_state.json ADDED
@@ -0,0 +1,1108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1500,
3
+ "best_metric": 0.7986094355583191,
4
+ "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1500",
5
+ "epoch": 1.6239675016926203,
6
+ "eval_steps": 500,
7
+ "global_step": 1500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.010832769126607989,
14
+ "grad_norm": 0.7727395296096802,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 0.889,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.021665538253215978,
21
+ "grad_norm": 0.8008129596710205,
22
+ "learning_rate": 3.8e-05,
23
+ "loss": 0.8378,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.03249830737982397,
28
+ "grad_norm": 0.9147247076034546,
29
+ "learning_rate": 5.8e-05,
30
+ "loss": 0.8108,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.043331076506431955,
35
+ "grad_norm": 0.8121607303619385,
36
+ "learning_rate": 7.800000000000001e-05,
37
+ "loss": 0.8597,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.05416384563303995,
42
+ "grad_norm": 1.0018593072891235,
43
+ "learning_rate": 9.8e-05,
44
+ "loss": 0.7486,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.06499661475964794,
49
+ "grad_norm": 1.2048218250274658,
50
+ "learning_rate": 0.000118,
51
+ "loss": 0.6825,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.07582938388625593,
56
+ "grad_norm": 0.9863468408584595,
57
+ "learning_rate": 0.000138,
58
+ "loss": 0.6539,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.08666215301286391,
63
+ "grad_norm": 1.2911494970321655,
64
+ "learning_rate": 0.00015800000000000002,
65
+ "loss": 0.6198,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.0974949221394719,
70
+ "grad_norm": 1.159672737121582,
71
+ "learning_rate": 0.00017800000000000002,
72
+ "loss": 0.6222,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1083276912660799,
77
+ "grad_norm": 1.0924432277679443,
78
+ "learning_rate": 0.00019800000000000002,
79
+ "loss": 0.5923,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.11916046039268788,
84
+ "grad_norm": 1.3423463106155396,
85
+ "learning_rate": 0.00019932634730538925,
86
+ "loss": 0.5548,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.12999322951929587,
91
+ "grad_norm": 1.4929102659225464,
92
+ "learning_rate": 0.00019857784431137723,
93
+ "loss": 0.6701,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.14082599864590387,
98
+ "grad_norm": 0.9462954998016357,
99
+ "learning_rate": 0.00019782934131736527,
100
+ "loss": 0.8675,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.15165876777251186,
105
+ "grad_norm": 0.9912289977073669,
106
+ "learning_rate": 0.0001970808383233533,
107
+ "loss": 0.9074,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.16249153689911983,
112
+ "grad_norm": 1.1070538759231567,
113
+ "learning_rate": 0.00019633233532934132,
114
+ "loss": 0.8755,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.17332430602572782,
119
+ "grad_norm": 0.9465340375900269,
120
+ "learning_rate": 0.00019558383233532936,
121
+ "loss": 0.882,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.18415707515233581,
126
+ "grad_norm": 0.8657329678535461,
127
+ "learning_rate": 0.00019483532934131737,
128
+ "loss": 0.8737,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.1949898442789438,
133
+ "grad_norm": 0.7293577790260315,
134
+ "learning_rate": 0.0001940868263473054,
135
+ "loss": 0.8473,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.2058226134055518,
140
+ "grad_norm": 0.849353551864624,
141
+ "learning_rate": 0.00019333832335329343,
142
+ "loss": 0.9414,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2166553825321598,
147
+ "grad_norm": 0.7525314688682556,
148
+ "learning_rate": 0.00019258982035928144,
149
+ "loss": 0.8852,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.22748815165876776,
154
+ "grad_norm": 1.0732208490371704,
155
+ "learning_rate": 0.00019184131736526948,
156
+ "loss": 0.8074,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.23832092078537576,
161
+ "grad_norm": 0.8420374393463135,
162
+ "learning_rate": 0.0001910928143712575,
163
+ "loss": 0.9508,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.24915368991198375,
168
+ "grad_norm": 0.8308244347572327,
169
+ "learning_rate": 0.0001903443113772455,
170
+ "loss": 0.8734,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.25998645903859174,
175
+ "grad_norm": 0.9915153384208679,
176
+ "learning_rate": 0.00018959580838323354,
177
+ "loss": 0.8816,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.2708192281651997,
182
+ "grad_norm": 4.8621978759765625,
183
+ "learning_rate": 0.00018884730538922158,
184
+ "loss": 0.8848,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.28165199729180773,
189
+ "grad_norm": 0.7945590019226074,
190
+ "learning_rate": 0.0001880988023952096,
191
+ "loss": 0.8503,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.2924847664184157,
196
+ "grad_norm": 0.7896672487258911,
197
+ "learning_rate": 0.00018735029940119763,
198
+ "loss": 0.8798,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.3033175355450237,
203
+ "grad_norm": 0.8870701789855957,
204
+ "learning_rate": 0.00018660179640718564,
205
+ "loss": 0.9112,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.3141503046716317,
210
+ "grad_norm": 0.9003740549087524,
211
+ "learning_rate": 0.00018585329341317365,
212
+ "loss": 0.846,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.32498307379823965,
217
+ "grad_norm": 0.7067676186561584,
218
+ "learning_rate": 0.0001851047904191617,
219
+ "loss": 0.8588,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.3358158429248477,
224
+ "grad_norm": 0.9696246385574341,
225
+ "learning_rate": 0.0001843562874251497,
226
+ "loss": 0.8244,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.34664861205145564,
231
+ "grad_norm": 0.9892609715461731,
232
+ "learning_rate": 0.00018360778443113774,
233
+ "loss": 0.8214,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.35748138117806366,
238
+ "grad_norm": 0.822260856628418,
239
+ "learning_rate": 0.00018285928143712575,
240
+ "loss": 0.7977,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.36831415030467163,
245
+ "grad_norm": 0.7743964791297913,
246
+ "learning_rate": 0.00018211077844311376,
247
+ "loss": 0.8002,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.3791469194312796,
252
+ "grad_norm": 0.7090775370597839,
253
+ "learning_rate": 0.0001813622754491018,
254
+ "loss": 0.8192,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.3899796885578876,
259
+ "grad_norm": 1.0970802307128906,
260
+ "learning_rate": 0.00018061377245508984,
261
+ "loss": 0.8516,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.4008124576844956,
266
+ "grad_norm": 0.9633163213729858,
267
+ "learning_rate": 0.00017986526946107785,
268
+ "loss": 0.8414,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.4116452268111036,
273
+ "grad_norm": 0.6846926808357239,
274
+ "learning_rate": 0.00017911676646706587,
275
+ "loss": 0.8187,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.42247799593771157,
280
+ "grad_norm": 0.7262110710144043,
281
+ "learning_rate": 0.0001783682634730539,
282
+ "loss": 0.8572,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.4333107650643196,
287
+ "grad_norm": 0.8537372350692749,
288
+ "learning_rate": 0.00017761976047904192,
289
+ "loss": 0.8286,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.44414353419092756,
294
+ "grad_norm": 0.8860271573066711,
295
+ "learning_rate": 0.00017687125748502996,
296
+ "loss": 0.8416,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.4549763033175355,
301
+ "grad_norm": 0.7984218597412109,
302
+ "learning_rate": 0.000176122754491018,
303
+ "loss": 0.8373,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.46580907244414355,
308
+ "grad_norm": 0.8060943484306335,
309
+ "learning_rate": 0.000175374251497006,
310
+ "loss": 0.9165,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.4766418415707515,
315
+ "grad_norm": 0.7871391177177429,
316
+ "learning_rate": 0.00017462574850299402,
317
+ "loss": 0.8276,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.48747461069735953,
322
+ "grad_norm": 0.7732688784599304,
323
+ "learning_rate": 0.00017387724550898203,
324
+ "loss": 0.8346,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.4983073798239675,
329
+ "grad_norm": 0.9314000606536865,
330
+ "learning_rate": 0.00017312874251497007,
331
+ "loss": 0.8291,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5091401489505755,
336
+ "grad_norm": 0.6721988916397095,
337
+ "learning_rate": 0.0001723802395209581,
338
+ "loss": 0.7091,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.5199729180771835,
343
+ "grad_norm": 0.825965940952301,
344
+ "learning_rate": 0.00017163173652694612,
345
+ "loss": 0.8934,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.5308056872037915,
350
+ "grad_norm": 0.8427668213844299,
351
+ "learning_rate": 0.00017088323353293413,
352
+ "loss": 0.7603,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.5416384563303994,
357
+ "grad_norm": 1.0061259269714355,
358
+ "learning_rate": 0.00017013473053892217,
359
+ "loss": 0.8277,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.5416384563303994,
364
+ "eval_loss": 0.8331602811813354,
365
+ "eval_runtime": 355.9061,
366
+ "eval_samples_per_second": 4.614,
367
+ "eval_steps_per_second": 2.307,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.5524712254570074,
372
+ "grad_norm": 0.8820628523826599,
373
+ "learning_rate": 0.00016938622754491018,
374
+ "loss": 0.8348,
375
+ "step": 510
376
+ },
377
+ {
378
+ "epoch": 0.5633039945836155,
379
+ "grad_norm": 0.8095284700393677,
380
+ "learning_rate": 0.00016863772455089822,
381
+ "loss": 0.9172,
382
+ "step": 520
383
+ },
384
+ {
385
+ "epoch": 0.5741367637102234,
386
+ "grad_norm": 0.6959540843963623,
387
+ "learning_rate": 0.00016788922155688623,
388
+ "loss": 0.838,
389
+ "step": 530
390
+ },
391
+ {
392
+ "epoch": 0.5849695328368314,
393
+ "grad_norm": 0.835831880569458,
394
+ "learning_rate": 0.00016714071856287424,
395
+ "loss": 0.8887,
396
+ "step": 540
397
+ },
398
+ {
399
+ "epoch": 0.5958023019634394,
400
+ "grad_norm": 0.9289611577987671,
401
+ "learning_rate": 0.00016639221556886228,
402
+ "loss": 0.8514,
403
+ "step": 550
404
+ },
405
+ {
406
+ "epoch": 0.6066350710900474,
407
+ "grad_norm": 0.6904628872871399,
408
+ "learning_rate": 0.00016564371257485032,
409
+ "loss": 0.8645,
410
+ "step": 560
411
+ },
412
+ {
413
+ "epoch": 0.6174678402166554,
414
+ "grad_norm": 0.8879178762435913,
415
+ "learning_rate": 0.00016489520958083833,
416
+ "loss": 0.8201,
417
+ "step": 570
418
+ },
419
+ {
420
+ "epoch": 0.6283006093432634,
421
+ "grad_norm": 0.8411425948143005,
422
+ "learning_rate": 0.00016414670658682637,
423
+ "loss": 0.836,
424
+ "step": 580
425
+ },
426
+ {
427
+ "epoch": 0.6391333784698714,
428
+ "grad_norm": 0.8564555644989014,
429
+ "learning_rate": 0.00016339820359281436,
430
+ "loss": 0.7724,
431
+ "step": 590
432
+ },
433
+ {
434
+ "epoch": 0.6499661475964793,
435
+ "grad_norm": 0.8382830619812012,
436
+ "learning_rate": 0.0001626497005988024,
437
+ "loss": 0.7839,
438
+ "step": 600
439
+ },
440
+ {
441
+ "epoch": 0.6607989167230873,
442
+ "grad_norm": 0.7657437920570374,
443
+ "learning_rate": 0.00016190119760479043,
444
+ "loss": 0.7973,
445
+ "step": 610
446
+ },
447
+ {
448
+ "epoch": 0.6716316858496953,
449
+ "grad_norm": 0.7758445143699646,
450
+ "learning_rate": 0.00016115269461077845,
451
+ "loss": 0.8111,
452
+ "step": 620
453
+ },
454
+ {
455
+ "epoch": 0.6824644549763034,
456
+ "grad_norm": 1.0041533708572388,
457
+ "learning_rate": 0.00016040419161676649,
458
+ "loss": 0.8359,
459
+ "step": 630
460
+ },
461
+ {
462
+ "epoch": 0.6932972241029113,
463
+ "grad_norm": 0.9679577946662903,
464
+ "learning_rate": 0.0001596556886227545,
465
+ "loss": 0.8822,
466
+ "step": 640
467
+ },
468
+ {
469
+ "epoch": 0.7041299932295193,
470
+ "grad_norm": 0.8141391277313232,
471
+ "learning_rate": 0.0001589071856287425,
472
+ "loss": 0.8714,
473
+ "step": 650
474
+ },
475
+ {
476
+ "epoch": 0.7149627623561273,
477
+ "grad_norm": 0.7982810139656067,
478
+ "learning_rate": 0.00015815868263473055,
479
+ "loss": 0.856,
480
+ "step": 660
481
+ },
482
+ {
483
+ "epoch": 0.7257955314827352,
484
+ "grad_norm": 0.7932000160217285,
485
+ "learning_rate": 0.00015741017964071859,
486
+ "loss": 0.8405,
487
+ "step": 670
488
+ },
489
+ {
490
+ "epoch": 0.7366283006093433,
491
+ "grad_norm": 0.7269508242607117,
492
+ "learning_rate": 0.0001566616766467066,
493
+ "loss": 0.8371,
494
+ "step": 680
495
+ },
496
+ {
497
+ "epoch": 0.7474610697359513,
498
+ "grad_norm": 0.9001722931861877,
499
+ "learning_rate": 0.0001559131736526946,
500
+ "loss": 0.8305,
501
+ "step": 690
502
+ },
503
+ {
504
+ "epoch": 0.7582938388625592,
505
+ "grad_norm": 0.6795508861541748,
506
+ "learning_rate": 0.00015516467065868262,
507
+ "loss": 0.8324,
508
+ "step": 700
509
+ },
510
+ {
511
+ "epoch": 0.7691266079891672,
512
+ "grad_norm": 0.8868729472160339,
513
+ "learning_rate": 0.00015441616766467066,
514
+ "loss": 0.8521,
515
+ "step": 710
516
+ },
517
+ {
518
+ "epoch": 0.7799593771157752,
519
+ "grad_norm": 0.9720478653907776,
520
+ "learning_rate": 0.0001536676646706587,
521
+ "loss": 0.7759,
522
+ "step": 720
523
+ },
524
+ {
525
+ "epoch": 0.7907921462423833,
526
+ "grad_norm": 0.8006075620651245,
527
+ "learning_rate": 0.0001529191616766467,
528
+ "loss": 0.7981,
529
+ "step": 730
530
+ },
531
+ {
532
+ "epoch": 0.8016249153689912,
533
+ "grad_norm": 0.9107721447944641,
534
+ "learning_rate": 0.00015217065868263475,
535
+ "loss": 0.7868,
536
+ "step": 740
537
+ },
538
+ {
539
+ "epoch": 0.8124576844955992,
540
+ "grad_norm": 0.7584466338157654,
541
+ "learning_rate": 0.00015142215568862276,
542
+ "loss": 0.7401,
543
+ "step": 750
544
+ },
545
+ {
546
+ "epoch": 0.8232904536222072,
547
+ "grad_norm": 1.0075221061706543,
548
+ "learning_rate": 0.00015067365269461077,
549
+ "loss": 0.8024,
550
+ "step": 760
551
+ },
552
+ {
553
+ "epoch": 0.8341232227488151,
554
+ "grad_norm": 0.8769344091415405,
555
+ "learning_rate": 0.0001499251497005988,
556
+ "loss": 0.7779,
557
+ "step": 770
558
+ },
559
+ {
560
+ "epoch": 0.8449559918754231,
561
+ "grad_norm": 0.84312903881073,
562
+ "learning_rate": 0.00014917664670658685,
563
+ "loss": 0.8314,
564
+ "step": 780
565
+ },
566
+ {
567
+ "epoch": 0.8557887610020312,
568
+ "grad_norm": 0.8116353750228882,
569
+ "learning_rate": 0.00014842814371257486,
570
+ "loss": 0.8146,
571
+ "step": 790
572
+ },
573
+ {
574
+ "epoch": 0.8666215301286392,
575
+ "grad_norm": 0.8301011919975281,
576
+ "learning_rate": 0.00014767964071856287,
577
+ "loss": 0.7422,
578
+ "step": 800
579
+ },
580
+ {
581
+ "epoch": 0.8774542992552471,
582
+ "grad_norm": 0.8579692244529724,
583
+ "learning_rate": 0.00014693113772455091,
584
+ "loss": 0.7442,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 0.8882870683818551,
589
+ "grad_norm": 0.7513943910598755,
590
+ "learning_rate": 0.00014618263473053893,
591
+ "loss": 0.7671,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 0.8991198375084631,
596
+ "grad_norm": 0.9639107584953308,
597
+ "learning_rate": 0.00014543413173652696,
598
+ "loss": 0.7896,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 0.909952606635071,
603
+ "grad_norm": 0.8897636532783508,
604
+ "learning_rate": 0.00014468562874251498,
605
+ "loss": 0.7613,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 0.9207853757616791,
610
+ "grad_norm": 0.7998213171958923,
611
+ "learning_rate": 0.000143937125748503,
612
+ "loss": 0.7647,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 0.9316181448882871,
617
+ "grad_norm": 0.6916050910949707,
618
+ "learning_rate": 0.00014318862275449103,
619
+ "loss": 0.7697,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 0.942450914014895,
624
+ "grad_norm": 1.0154324769973755,
625
+ "learning_rate": 0.00014244011976047904,
626
+ "loss": 0.7314,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 0.953283683141503,
631
+ "grad_norm": 0.9787517786026001,
632
+ "learning_rate": 0.00014169161676646708,
633
+ "loss": 0.8047,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 0.964116452268111,
638
+ "grad_norm": 0.6035457253456116,
639
+ "learning_rate": 0.00014094311377245512,
640
+ "loss": 0.783,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 0.9749492213947191,
645
+ "grad_norm": 0.940951943397522,
646
+ "learning_rate": 0.0001401946107784431,
647
+ "loss": 0.7741,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 0.985781990521327,
652
+ "grad_norm": 0.7785654067993164,
653
+ "learning_rate": 0.00013944610778443114,
654
+ "loss": 0.7855,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 0.996614759647935,
659
+ "grad_norm": 0.8356137275695801,
660
+ "learning_rate": 0.00013869760479041918,
661
+ "loss": 0.8292,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.0064996614759647,
666
+ "grad_norm": 0.6590499877929688,
667
+ "learning_rate": 0.0001379491017964072,
668
+ "loss": 0.6858,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.0173324306025728,
673
+ "grad_norm": 1.0389671325683594,
674
+ "learning_rate": 0.00013720059880239523,
675
+ "loss": 0.6097,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.0281651997291807,
680
+ "grad_norm": 0.9596243500709534,
681
+ "learning_rate": 0.00013645209580838324,
682
+ "loss": 0.5676,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.0389979688557887,
687
+ "grad_norm": 1.0831798315048218,
688
+ "learning_rate": 0.00013570359281437125,
689
+ "loss": 0.6106,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.0498307379823968,
694
+ "grad_norm": 0.92978835105896,
695
+ "learning_rate": 0.0001349550898203593,
696
+ "loss": 0.5924,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.0606635071090047,
701
+ "grad_norm": 0.9672062993049622,
702
+ "learning_rate": 0.0001342065868263473,
703
+ "loss": 0.5496,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.0714962762356128,
708
+ "grad_norm": 1.1402652263641357,
709
+ "learning_rate": 0.00013345808383233534,
710
+ "loss": 0.5871,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.0823290453622207,
715
+ "grad_norm": 1.1109035015106201,
716
+ "learning_rate": 0.00013270958083832335,
717
+ "loss": 0.5424,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.0823290453622207,
722
+ "eval_loss": 0.8179630041122437,
723
+ "eval_runtime": 357.2769,
724
+ "eval_samples_per_second": 4.596,
725
+ "eval_steps_per_second": 2.298,
726
+ "step": 1000
727
+ },
728
+ {
729
+ "epoch": 1.0931618144888287,
730
+ "grad_norm": 0.8117087483406067,
731
+ "learning_rate": 0.00013196107784431137,
732
+ "loss": 0.5636,
733
+ "step": 1010
734
+ },
735
+ {
736
+ "epoch": 1.1039945836154368,
737
+ "grad_norm": 0.86320561170578,
738
+ "learning_rate": 0.0001312125748502994,
739
+ "loss": 0.5191,
740
+ "step": 1020
741
+ },
742
+ {
743
+ "epoch": 1.1148273527420447,
744
+ "grad_norm": 1.1274133920669556,
745
+ "learning_rate": 0.00013046407185628744,
746
+ "loss": 0.5891,
747
+ "step": 1030
748
+ },
749
+ {
750
+ "epoch": 1.1256601218686526,
751
+ "grad_norm": 1.0116336345672607,
752
+ "learning_rate": 0.00012971556886227546,
753
+ "loss": 0.5579,
754
+ "step": 1040
755
+ },
756
+ {
757
+ "epoch": 1.1364928909952607,
758
+ "grad_norm": 0.9277855157852173,
759
+ "learning_rate": 0.0001289670658682635,
760
+ "loss": 0.5971,
761
+ "step": 1050
762
+ },
763
+ {
764
+ "epoch": 1.1473256601218687,
765
+ "grad_norm": 1.0700503587722778,
766
+ "learning_rate": 0.0001282185628742515,
767
+ "loss": 0.5815,
768
+ "step": 1060
769
+ },
770
+ {
771
+ "epoch": 1.1581584292484766,
772
+ "grad_norm": 0.9346574544906616,
773
+ "learning_rate": 0.00012747005988023952,
774
+ "loss": 0.5472,
775
+ "step": 1070
776
+ },
777
+ {
778
+ "epoch": 1.1689911983750847,
779
+ "grad_norm": 1.047631025314331,
780
+ "learning_rate": 0.00012672155688622756,
781
+ "loss": 0.5479,
782
+ "step": 1080
783
+ },
784
+ {
785
+ "epoch": 1.1798239675016926,
786
+ "grad_norm": 0.9931487441062927,
787
+ "learning_rate": 0.00012597305389221557,
788
+ "loss": 0.5521,
789
+ "step": 1090
790
+ },
791
+ {
792
+ "epoch": 1.1906567366283005,
793
+ "grad_norm": 0.9764857292175293,
794
+ "learning_rate": 0.0001252245508982036,
795
+ "loss": 0.584,
796
+ "step": 1100
797
+ },
798
+ {
799
+ "epoch": 1.2014895057549086,
800
+ "grad_norm": 1.0661903619766235,
801
+ "learning_rate": 0.00012447604790419162,
802
+ "loss": 0.6101,
803
+ "step": 1110
804
+ },
805
+ {
806
+ "epoch": 1.2123222748815166,
807
+ "grad_norm": 1.0962295532226562,
808
+ "learning_rate": 0.00012372754491017963,
809
+ "loss": 0.6028,
810
+ "step": 1120
811
+ },
812
+ {
813
+ "epoch": 1.2231550440081245,
814
+ "grad_norm": 0.9794766306877136,
815
+ "learning_rate": 0.00012297904191616767,
816
+ "loss": 0.5813,
817
+ "step": 1130
818
+ },
819
+ {
820
+ "epoch": 1.2339878131347326,
821
+ "grad_norm": 0.9556275606155396,
822
+ "learning_rate": 0.0001222305389221557,
823
+ "loss": 0.5662,
824
+ "step": 1140
825
+ },
826
+ {
827
+ "epoch": 1.2448205822613405,
828
+ "grad_norm": 1.1200224161148071,
829
+ "learning_rate": 0.0001214820359281437,
830
+ "loss": 0.5642,
831
+ "step": 1150
832
+ },
833
+ {
834
+ "epoch": 1.2556533513879486,
835
+ "grad_norm": 1.0518434047698975,
836
+ "learning_rate": 0.00012073353293413175,
837
+ "loss": 0.6126,
838
+ "step": 1160
839
+ },
840
+ {
841
+ "epoch": 1.2664861205145566,
842
+ "grad_norm": 1.1709963083267212,
843
+ "learning_rate": 0.00011998502994011977,
844
+ "loss": 0.5189,
845
+ "step": 1170
846
+ },
847
+ {
848
+ "epoch": 1.2773188896411645,
849
+ "grad_norm": 0.8867760896682739,
850
+ "learning_rate": 0.00011923652694610778,
851
+ "loss": 0.6098,
852
+ "step": 1180
853
+ },
854
+ {
855
+ "epoch": 1.2881516587677724,
856
+ "grad_norm": 0.9317127466201782,
857
+ "learning_rate": 0.00011848802395209582,
858
+ "loss": 0.5667,
859
+ "step": 1190
860
+ },
861
+ {
862
+ "epoch": 1.2989844278943805,
863
+ "grad_norm": 1.1382100582122803,
864
+ "learning_rate": 0.00011773952095808385,
865
+ "loss": 0.5756,
866
+ "step": 1200
867
+ },
868
+ {
869
+ "epoch": 1.3098171970209884,
870
+ "grad_norm": 0.9819681644439697,
871
+ "learning_rate": 0.00011699101796407186,
872
+ "loss": 0.5922,
873
+ "step": 1210
874
+ },
875
+ {
876
+ "epoch": 1.3206499661475966,
877
+ "grad_norm": 1.0776174068450928,
878
+ "learning_rate": 0.00011624251497005988,
879
+ "loss": 0.5728,
880
+ "step": 1220
881
+ },
882
+ {
883
+ "epoch": 1.3314827352742045,
884
+ "grad_norm": 1.0137302875518799,
885
+ "learning_rate": 0.0001154940119760479,
886
+ "loss": 0.5603,
887
+ "step": 1230
888
+ },
889
+ {
890
+ "epoch": 1.3423155044008124,
891
+ "grad_norm": 1.1223585605621338,
892
+ "learning_rate": 0.00011474550898203593,
893
+ "loss": 0.5639,
894
+ "step": 1240
895
+ },
896
+ {
897
+ "epoch": 1.3531482735274205,
898
+ "grad_norm": 0.8942229747772217,
899
+ "learning_rate": 0.00011399700598802396,
900
+ "loss": 0.586,
901
+ "step": 1250
902
+ },
903
+ {
904
+ "epoch": 1.3639810426540284,
905
+ "grad_norm": 1.225698709487915,
906
+ "learning_rate": 0.00011324850299401197,
907
+ "loss": 0.563,
908
+ "step": 1260
909
+ },
910
+ {
911
+ "epoch": 1.3748138117806366,
912
+ "grad_norm": 1.159463882446289,
913
+ "learning_rate": 0.00011250000000000001,
914
+ "loss": 0.5898,
915
+ "step": 1270
916
+ },
917
+ {
918
+ "epoch": 1.3856465809072445,
919
+ "grad_norm": 1.0059807300567627,
920
+ "learning_rate": 0.00011175149700598804,
921
+ "loss": 0.6096,
922
+ "step": 1280
923
+ },
924
+ {
925
+ "epoch": 1.3964793500338524,
926
+ "grad_norm": 1.1433062553405762,
927
+ "learning_rate": 0.00011100299401197605,
928
+ "loss": 0.5411,
929
+ "step": 1290
930
+ },
931
+ {
932
+ "epoch": 1.4073121191604603,
933
+ "grad_norm": 1.0282905101776123,
934
+ "learning_rate": 0.00011025449101796407,
935
+ "loss": 0.5928,
936
+ "step": 1300
937
+ },
938
+ {
939
+ "epoch": 1.4181448882870684,
940
+ "grad_norm": 0.8389853835105896,
941
+ "learning_rate": 0.00010950598802395211,
942
+ "loss": 0.5657,
943
+ "step": 1310
944
+ },
945
+ {
946
+ "epoch": 1.4289776574136763,
947
+ "grad_norm": 1.132350206375122,
948
+ "learning_rate": 0.00010875748502994012,
949
+ "loss": 0.6196,
950
+ "step": 1320
951
+ },
952
+ {
953
+ "epoch": 1.4398104265402845,
954
+ "grad_norm": 1.1093621253967285,
955
+ "learning_rate": 0.00010800898203592815,
956
+ "loss": 0.5845,
957
+ "step": 1330
958
+ },
959
+ {
960
+ "epoch": 1.4506431956668924,
961
+ "grad_norm": 1.3198816776275635,
962
+ "learning_rate": 0.00010726047904191616,
963
+ "loss": 0.5711,
964
+ "step": 1340
965
+ },
966
+ {
967
+ "epoch": 1.4614759647935003,
968
+ "grad_norm": 0.8968690037727356,
969
+ "learning_rate": 0.0001065119760479042,
970
+ "loss": 0.6075,
971
+ "step": 1350
972
+ },
973
+ {
974
+ "epoch": 1.4723087339201082,
975
+ "grad_norm": 1.0248963832855225,
976
+ "learning_rate": 0.00010576347305389222,
977
+ "loss": 0.5869,
978
+ "step": 1360
979
+ },
980
+ {
981
+ "epoch": 1.4831415030467163,
982
+ "grad_norm": 1.2115412950515747,
983
+ "learning_rate": 0.00010501497005988024,
984
+ "loss": 0.549,
985
+ "step": 1370
986
+ },
987
+ {
988
+ "epoch": 1.4939742721733242,
989
+ "grad_norm": 1.1320476531982422,
990
+ "learning_rate": 0.00010426646706586826,
991
+ "loss": 0.5661,
992
+ "step": 1380
993
+ },
994
+ {
995
+ "epoch": 1.5048070412999324,
996
+ "grad_norm": 1.0099844932556152,
997
+ "learning_rate": 0.0001035179640718563,
998
+ "loss": 0.5953,
999
+ "step": 1390
1000
+ },
1001
+ {
1002
+ "epoch": 1.5156398104265403,
1003
+ "grad_norm": 0.9809553623199463,
1004
+ "learning_rate": 0.00010276946107784431,
1005
+ "loss": 0.578,
1006
+ "step": 1400
1007
+ },
1008
+ {
1009
+ "epoch": 1.5264725795531482,
1010
+ "grad_norm": 1.4169446229934692,
1011
+ "learning_rate": 0.00010202095808383234,
1012
+ "loss": 0.6173,
1013
+ "step": 1410
1014
+ },
1015
+ {
1016
+ "epoch": 1.537305348679756,
1017
+ "grad_norm": 1.1033852100372314,
1018
+ "learning_rate": 0.00010127245508982038,
1019
+ "loss": 0.5917,
1020
+ "step": 1420
1021
+ },
1022
+ {
1023
+ "epoch": 1.5481381178063642,
1024
+ "grad_norm": 1.1163372993469238,
1025
+ "learning_rate": 0.00010052395209580839,
1026
+ "loss": 0.589,
1027
+ "step": 1430
1028
+ },
1029
+ {
1030
+ "epoch": 1.5589708869329724,
1031
+ "grad_norm": 0.9786676168441772,
1032
+ "learning_rate": 9.977544910179641e-05,
1033
+ "loss": 0.5425,
1034
+ "step": 1440
1035
+ },
1036
+ {
1037
+ "epoch": 1.5698036560595803,
1038
+ "grad_norm": 1.034001111984253,
1039
+ "learning_rate": 9.902694610778444e-05,
1040
+ "loss": 0.5467,
1041
+ "step": 1450
1042
+ },
1043
+ {
1044
+ "epoch": 1.5806364251861882,
1045
+ "grad_norm": 0.8697665929794312,
1046
+ "learning_rate": 9.827844311377245e-05,
1047
+ "loss": 0.5882,
1048
+ "step": 1460
1049
+ },
1050
+ {
1051
+ "epoch": 1.591469194312796,
1052
+ "grad_norm": 1.0091935396194458,
1053
+ "learning_rate": 9.752994011976049e-05,
1054
+ "loss": 0.573,
1055
+ "step": 1470
1056
+ },
1057
+ {
1058
+ "epoch": 1.6023019634394042,
1059
+ "grad_norm": 1.0126501321792603,
1060
+ "learning_rate": 9.678143712574852e-05,
1061
+ "loss": 0.6083,
1062
+ "step": 1480
1063
+ },
1064
+ {
1065
+ "epoch": 1.6131347325660121,
1066
+ "grad_norm": 0.9271785020828247,
1067
+ "learning_rate": 9.603293413173653e-05,
1068
+ "loss": 0.5564,
1069
+ "step": 1490
1070
+ },
1071
+ {
1072
+ "epoch": 1.6239675016926203,
1073
+ "grad_norm": 1.0736253261566162,
1074
+ "learning_rate": 9.528443113772455e-05,
1075
+ "loss": 0.5696,
1076
+ "step": 1500
1077
+ },
1078
+ {
1079
+ "epoch": 1.6239675016926203,
1080
+ "eval_loss": 0.7986094355583191,
1081
+ "eval_runtime": 358.2761,
1082
+ "eval_samples_per_second": 4.583,
1083
+ "eval_steps_per_second": 2.292,
1084
+ "step": 1500
1085
+ }
1086
+ ],
1087
+ "logging_steps": 10,
1088
+ "max_steps": 2772,
1089
+ "num_input_tokens_seen": 0,
1090
+ "num_train_epochs": 3,
1091
+ "save_steps": 500,
1092
+ "stateful_callbacks": {
1093
+ "TrainerControl": {
1094
+ "args": {
1095
+ "should_epoch_stop": false,
1096
+ "should_evaluate": false,
1097
+ "should_log": false,
1098
+ "should_save": true,
1099
+ "should_training_stop": false
1100
+ },
1101
+ "attributes": {}
1102
+ }
1103
+ },
1104
+ "total_flos": 5.280941991033569e+17,
1105
+ "train_batch_size": 2,
1106
+ "trial_name": null,
1107
+ "trial_params": null
1108
+ }
checkpoint-1500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
3
+ size 5304
checkpoint-2000/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BioMistral/BioMistral-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:BioMistral/BioMistral-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
checkpoint-2000/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "BioMistral/BioMistral-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "o_proj",
34
+ "v_proj"
35
+ ],
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
checkpoint-2000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd05c5221728172b32903a08c1f23e14e7940ee3d5966867b6bdf6832ca1577a
3
+ size 167832240
checkpoint-2000/chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}
checkpoint-2000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fe8da5c95eebc12a0c3cc50e53428ecb43730ad14835546d055560c72f55bc2
3
+ size 335922386
checkpoint-2000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24310892b0a3280ab0672041351f54ecc2135b28fd61730f51cebbc6be2c0466
3
+ size 14244
checkpoint-2000/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2534f71902c9d04ae6347b67f8675e467c1b2ff5627e9c11581ea7479caaba7c
3
+ size 988
checkpoint-2000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41b7aede3a1bc21b4c97c5d108f5c1a0971999e3de6f79f45bc105eb7c419b8f
3
+ size 1064
checkpoint-2000/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-2000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-2000/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-2000/tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": true,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }
checkpoint-2000/trainer_state.json ADDED
@@ -0,0 +1,1466 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1500,
3
+ "best_metric": 0.7986094355583191,
4
+ "best_model_checkpoint": "./biomistral-lora-finetuned/checkpoint-1500",
5
+ "epoch": 2.1646580907244415,
6
+ "eval_steps": 500,
7
+ "global_step": 2000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.010832769126607989,
14
+ "grad_norm": 0.7727395296096802,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 0.889,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.021665538253215978,
21
+ "grad_norm": 0.8008129596710205,
22
+ "learning_rate": 3.8e-05,
23
+ "loss": 0.8378,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.03249830737982397,
28
+ "grad_norm": 0.9147247076034546,
29
+ "learning_rate": 5.8e-05,
30
+ "loss": 0.8108,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.043331076506431955,
35
+ "grad_norm": 0.8121607303619385,
36
+ "learning_rate": 7.800000000000001e-05,
37
+ "loss": 0.8597,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.05416384563303995,
42
+ "grad_norm": 1.0018593072891235,
43
+ "learning_rate": 9.8e-05,
44
+ "loss": 0.7486,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.06499661475964794,
49
+ "grad_norm": 1.2048218250274658,
50
+ "learning_rate": 0.000118,
51
+ "loss": 0.6825,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.07582938388625593,
56
+ "grad_norm": 0.9863468408584595,
57
+ "learning_rate": 0.000138,
58
+ "loss": 0.6539,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.08666215301286391,
63
+ "grad_norm": 1.2911494970321655,
64
+ "learning_rate": 0.00015800000000000002,
65
+ "loss": 0.6198,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.0974949221394719,
70
+ "grad_norm": 1.159672737121582,
71
+ "learning_rate": 0.00017800000000000002,
72
+ "loss": 0.6222,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1083276912660799,
77
+ "grad_norm": 1.0924432277679443,
78
+ "learning_rate": 0.00019800000000000002,
79
+ "loss": 0.5923,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.11916046039268788,
84
+ "grad_norm": 1.3423463106155396,
85
+ "learning_rate": 0.00019932634730538925,
86
+ "loss": 0.5548,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.12999322951929587,
91
+ "grad_norm": 1.4929102659225464,
92
+ "learning_rate": 0.00019857784431137723,
93
+ "loss": 0.6701,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.14082599864590387,
98
+ "grad_norm": 0.9462954998016357,
99
+ "learning_rate": 0.00019782934131736527,
100
+ "loss": 0.8675,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.15165876777251186,
105
+ "grad_norm": 0.9912289977073669,
106
+ "learning_rate": 0.0001970808383233533,
107
+ "loss": 0.9074,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.16249153689911983,
112
+ "grad_norm": 1.1070538759231567,
113
+ "learning_rate": 0.00019633233532934132,
114
+ "loss": 0.8755,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.17332430602572782,
119
+ "grad_norm": 0.9465340375900269,
120
+ "learning_rate": 0.00019558383233532936,
121
+ "loss": 0.882,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.18415707515233581,
126
+ "grad_norm": 0.8657329678535461,
127
+ "learning_rate": 0.00019483532934131737,
128
+ "loss": 0.8737,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.1949898442789438,
133
+ "grad_norm": 0.7293577790260315,
134
+ "learning_rate": 0.0001940868263473054,
135
+ "loss": 0.8473,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.2058226134055518,
140
+ "grad_norm": 0.849353551864624,
141
+ "learning_rate": 0.00019333832335329343,
142
+ "loss": 0.9414,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2166553825321598,
147
+ "grad_norm": 0.7525314688682556,
148
+ "learning_rate": 0.00019258982035928144,
149
+ "loss": 0.8852,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.22748815165876776,
154
+ "grad_norm": 1.0732208490371704,
155
+ "learning_rate": 0.00019184131736526948,
156
+ "loss": 0.8074,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.23832092078537576,
161
+ "grad_norm": 0.8420374393463135,
162
+ "learning_rate": 0.0001910928143712575,
163
+ "loss": 0.9508,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.24915368991198375,
168
+ "grad_norm": 0.8308244347572327,
169
+ "learning_rate": 0.0001903443113772455,
170
+ "loss": 0.8734,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.25998645903859174,
175
+ "grad_norm": 0.9915153384208679,
176
+ "learning_rate": 0.00018959580838323354,
177
+ "loss": 0.8816,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.2708192281651997,
182
+ "grad_norm": 4.8621978759765625,
183
+ "learning_rate": 0.00018884730538922158,
184
+ "loss": 0.8848,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.28165199729180773,
189
+ "grad_norm": 0.7945590019226074,
190
+ "learning_rate": 0.0001880988023952096,
191
+ "loss": 0.8503,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.2924847664184157,
196
+ "grad_norm": 0.7896672487258911,
197
+ "learning_rate": 0.00018735029940119763,
198
+ "loss": 0.8798,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.3033175355450237,
203
+ "grad_norm": 0.8870701789855957,
204
+ "learning_rate": 0.00018660179640718564,
205
+ "loss": 0.9112,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.3141503046716317,
210
+ "grad_norm": 0.9003740549087524,
211
+ "learning_rate": 0.00018585329341317365,
212
+ "loss": 0.846,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.32498307379823965,
217
+ "grad_norm": 0.7067676186561584,
218
+ "learning_rate": 0.0001851047904191617,
219
+ "loss": 0.8588,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.3358158429248477,
224
+ "grad_norm": 0.9696246385574341,
225
+ "learning_rate": 0.0001843562874251497,
226
+ "loss": 0.8244,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.34664861205145564,
231
+ "grad_norm": 0.9892609715461731,
232
+ "learning_rate": 0.00018360778443113774,
233
+ "loss": 0.8214,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.35748138117806366,
238
+ "grad_norm": 0.822260856628418,
239
+ "learning_rate": 0.00018285928143712575,
240
+ "loss": 0.7977,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.36831415030467163,
245
+ "grad_norm": 0.7743964791297913,
246
+ "learning_rate": 0.00018211077844311376,
247
+ "loss": 0.8002,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.3791469194312796,
252
+ "grad_norm": 0.7090775370597839,
253
+ "learning_rate": 0.0001813622754491018,
254
+ "loss": 0.8192,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.3899796885578876,
259
+ "grad_norm": 1.0970802307128906,
260
+ "learning_rate": 0.00018061377245508984,
261
+ "loss": 0.8516,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.4008124576844956,
266
+ "grad_norm": 0.9633163213729858,
267
+ "learning_rate": 0.00017986526946107785,
268
+ "loss": 0.8414,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.4116452268111036,
273
+ "grad_norm": 0.6846926808357239,
274
+ "learning_rate": 0.00017911676646706587,
275
+ "loss": 0.8187,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.42247799593771157,
280
+ "grad_norm": 0.7262110710144043,
281
+ "learning_rate": 0.0001783682634730539,
282
+ "loss": 0.8572,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.4333107650643196,
287
+ "grad_norm": 0.8537372350692749,
288
+ "learning_rate": 0.00017761976047904192,
289
+ "loss": 0.8286,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.44414353419092756,
294
+ "grad_norm": 0.8860271573066711,
295
+ "learning_rate": 0.00017687125748502996,
296
+ "loss": 0.8416,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.4549763033175355,
301
+ "grad_norm": 0.7984218597412109,
302
+ "learning_rate": 0.000176122754491018,
303
+ "loss": 0.8373,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.46580907244414355,
308
+ "grad_norm": 0.8060943484306335,
309
+ "learning_rate": 0.000175374251497006,
310
+ "loss": 0.9165,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.4766418415707515,
315
+ "grad_norm": 0.7871391177177429,
316
+ "learning_rate": 0.00017462574850299402,
317
+ "loss": 0.8276,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.48747461069735953,
322
+ "grad_norm": 0.7732688784599304,
323
+ "learning_rate": 0.00017387724550898203,
324
+ "loss": 0.8346,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.4983073798239675,
329
+ "grad_norm": 0.9314000606536865,
330
+ "learning_rate": 0.00017312874251497007,
331
+ "loss": 0.8291,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5091401489505755,
336
+ "grad_norm": 0.6721988916397095,
337
+ "learning_rate": 0.0001723802395209581,
338
+ "loss": 0.7091,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.5199729180771835,
343
+ "grad_norm": 0.825965940952301,
344
+ "learning_rate": 0.00017163173652694612,
345
+ "loss": 0.8934,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.5308056872037915,
350
+ "grad_norm": 0.8427668213844299,
351
+ "learning_rate": 0.00017088323353293413,
352
+ "loss": 0.7603,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.5416384563303994,
357
+ "grad_norm": 1.0061259269714355,
358
+ "learning_rate": 0.00017013473053892217,
359
+ "loss": 0.8277,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.5416384563303994,
364
+ "eval_loss": 0.8331602811813354,
365
+ "eval_runtime": 355.9061,
366
+ "eval_samples_per_second": 4.614,
367
+ "eval_steps_per_second": 2.307,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.5524712254570074,
372
+ "grad_norm": 0.8820628523826599,
373
+ "learning_rate": 0.00016938622754491018,
374
+ "loss": 0.8348,
375
+ "step": 510
376
+ },
377
+ {
378
+ "epoch": 0.5633039945836155,
379
+ "grad_norm": 0.8095284700393677,
380
+ "learning_rate": 0.00016863772455089822,
381
+ "loss": 0.9172,
382
+ "step": 520
383
+ },
384
+ {
385
+ "epoch": 0.5741367637102234,
386
+ "grad_norm": 0.6959540843963623,
387
+ "learning_rate": 0.00016788922155688623,
388
+ "loss": 0.838,
389
+ "step": 530
390
+ },
391
+ {
392
+ "epoch": 0.5849695328368314,
393
+ "grad_norm": 0.835831880569458,
394
+ "learning_rate": 0.00016714071856287424,
395
+ "loss": 0.8887,
396
+ "step": 540
397
+ },
398
+ {
399
+ "epoch": 0.5958023019634394,
400
+ "grad_norm": 0.9289611577987671,
401
+ "learning_rate": 0.00016639221556886228,
402
+ "loss": 0.8514,
403
+ "step": 550
404
+ },
405
+ {
406
+ "epoch": 0.6066350710900474,
407
+ "grad_norm": 0.6904628872871399,
408
+ "learning_rate": 0.00016564371257485032,
409
+ "loss": 0.8645,
410
+ "step": 560
411
+ },
412
+ {
413
+ "epoch": 0.6174678402166554,
414
+ "grad_norm": 0.8879178762435913,
415
+ "learning_rate": 0.00016489520958083833,
416
+ "loss": 0.8201,
417
+ "step": 570
418
+ },
419
+ {
420
+ "epoch": 0.6283006093432634,
421
+ "grad_norm": 0.8411425948143005,
422
+ "learning_rate": 0.00016414670658682637,
423
+ "loss": 0.836,
424
+ "step": 580
425
+ },
426
+ {
427
+ "epoch": 0.6391333784698714,
428
+ "grad_norm": 0.8564555644989014,
429
+ "learning_rate": 0.00016339820359281436,
430
+ "loss": 0.7724,
431
+ "step": 590
432
+ },
433
+ {
434
+ "epoch": 0.6499661475964793,
435
+ "grad_norm": 0.8382830619812012,
436
+ "learning_rate": 0.0001626497005988024,
437
+ "loss": 0.7839,
438
+ "step": 600
439
+ },
440
+ {
441
+ "epoch": 0.6607989167230873,
442
+ "grad_norm": 0.7657437920570374,
443
+ "learning_rate": 0.00016190119760479043,
444
+ "loss": 0.7973,
445
+ "step": 610
446
+ },
447
+ {
448
+ "epoch": 0.6716316858496953,
449
+ "grad_norm": 0.7758445143699646,
450
+ "learning_rate": 0.00016115269461077845,
451
+ "loss": 0.8111,
452
+ "step": 620
453
+ },
454
+ {
455
+ "epoch": 0.6824644549763034,
456
+ "grad_norm": 1.0041533708572388,
457
+ "learning_rate": 0.00016040419161676649,
458
+ "loss": 0.8359,
459
+ "step": 630
460
+ },
461
+ {
462
+ "epoch": 0.6932972241029113,
463
+ "grad_norm": 0.9679577946662903,
464
+ "learning_rate": 0.0001596556886227545,
465
+ "loss": 0.8822,
466
+ "step": 640
467
+ },
468
+ {
469
+ "epoch": 0.7041299932295193,
470
+ "grad_norm": 0.8141391277313232,
471
+ "learning_rate": 0.0001589071856287425,
472
+ "loss": 0.8714,
473
+ "step": 650
474
+ },
475
+ {
476
+ "epoch": 0.7149627623561273,
477
+ "grad_norm": 0.7982810139656067,
478
+ "learning_rate": 0.00015815868263473055,
479
+ "loss": 0.856,
480
+ "step": 660
481
+ },
482
+ {
483
+ "epoch": 0.7257955314827352,
484
+ "grad_norm": 0.7932000160217285,
485
+ "learning_rate": 0.00015741017964071859,
486
+ "loss": 0.8405,
487
+ "step": 670
488
+ },
489
+ {
490
+ "epoch": 0.7366283006093433,
491
+ "grad_norm": 0.7269508242607117,
492
+ "learning_rate": 0.0001566616766467066,
493
+ "loss": 0.8371,
494
+ "step": 680
495
+ },
496
+ {
497
+ "epoch": 0.7474610697359513,
498
+ "grad_norm": 0.9001722931861877,
499
+ "learning_rate": 0.0001559131736526946,
500
+ "loss": 0.8305,
501
+ "step": 690
502
+ },
503
+ {
504
+ "epoch": 0.7582938388625592,
505
+ "grad_norm": 0.6795508861541748,
506
+ "learning_rate": 0.00015516467065868262,
507
+ "loss": 0.8324,
508
+ "step": 700
509
+ },
510
+ {
511
+ "epoch": 0.7691266079891672,
512
+ "grad_norm": 0.8868729472160339,
513
+ "learning_rate": 0.00015441616766467066,
514
+ "loss": 0.8521,
515
+ "step": 710
516
+ },
517
+ {
518
+ "epoch": 0.7799593771157752,
519
+ "grad_norm": 0.9720478653907776,
520
+ "learning_rate": 0.0001536676646706587,
521
+ "loss": 0.7759,
522
+ "step": 720
523
+ },
524
+ {
525
+ "epoch": 0.7907921462423833,
526
+ "grad_norm": 0.8006075620651245,
527
+ "learning_rate": 0.0001529191616766467,
528
+ "loss": 0.7981,
529
+ "step": 730
530
+ },
531
+ {
532
+ "epoch": 0.8016249153689912,
533
+ "grad_norm": 0.9107721447944641,
534
+ "learning_rate": 0.00015217065868263475,
535
+ "loss": 0.7868,
536
+ "step": 740
537
+ },
538
+ {
539
+ "epoch": 0.8124576844955992,
540
+ "grad_norm": 0.7584466338157654,
541
+ "learning_rate": 0.00015142215568862276,
542
+ "loss": 0.7401,
543
+ "step": 750
544
+ },
545
+ {
546
+ "epoch": 0.8232904536222072,
547
+ "grad_norm": 1.0075221061706543,
548
+ "learning_rate": 0.00015067365269461077,
549
+ "loss": 0.8024,
550
+ "step": 760
551
+ },
552
+ {
553
+ "epoch": 0.8341232227488151,
554
+ "grad_norm": 0.8769344091415405,
555
+ "learning_rate": 0.0001499251497005988,
556
+ "loss": 0.7779,
557
+ "step": 770
558
+ },
559
+ {
560
+ "epoch": 0.8449559918754231,
561
+ "grad_norm": 0.84312903881073,
562
+ "learning_rate": 0.00014917664670658685,
563
+ "loss": 0.8314,
564
+ "step": 780
565
+ },
566
+ {
567
+ "epoch": 0.8557887610020312,
568
+ "grad_norm": 0.8116353750228882,
569
+ "learning_rate": 0.00014842814371257486,
570
+ "loss": 0.8146,
571
+ "step": 790
572
+ },
573
+ {
574
+ "epoch": 0.8666215301286392,
575
+ "grad_norm": 0.8301011919975281,
576
+ "learning_rate": 0.00014767964071856287,
577
+ "loss": 0.7422,
578
+ "step": 800
579
+ },
580
+ {
581
+ "epoch": 0.8774542992552471,
582
+ "grad_norm": 0.8579692244529724,
583
+ "learning_rate": 0.00014693113772455091,
584
+ "loss": 0.7442,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 0.8882870683818551,
589
+ "grad_norm": 0.7513943910598755,
590
+ "learning_rate": 0.00014618263473053893,
591
+ "loss": 0.7671,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 0.8991198375084631,
596
+ "grad_norm": 0.9639107584953308,
597
+ "learning_rate": 0.00014543413173652696,
598
+ "loss": 0.7896,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 0.909952606635071,
603
+ "grad_norm": 0.8897636532783508,
604
+ "learning_rate": 0.00014468562874251498,
605
+ "loss": 0.7613,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 0.9207853757616791,
610
+ "grad_norm": 0.7998213171958923,
611
+ "learning_rate": 0.000143937125748503,
612
+ "loss": 0.7647,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 0.9316181448882871,
617
+ "grad_norm": 0.6916050910949707,
618
+ "learning_rate": 0.00014318862275449103,
619
+ "loss": 0.7697,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 0.942450914014895,
624
+ "grad_norm": 1.0154324769973755,
625
+ "learning_rate": 0.00014244011976047904,
626
+ "loss": 0.7314,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 0.953283683141503,
631
+ "grad_norm": 0.9787517786026001,
632
+ "learning_rate": 0.00014169161676646708,
633
+ "loss": 0.8047,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 0.964116452268111,
638
+ "grad_norm": 0.6035457253456116,
639
+ "learning_rate": 0.00014094311377245512,
640
+ "loss": 0.783,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 0.9749492213947191,
645
+ "grad_norm": 0.940951943397522,
646
+ "learning_rate": 0.0001401946107784431,
647
+ "loss": 0.7741,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 0.985781990521327,
652
+ "grad_norm": 0.7785654067993164,
653
+ "learning_rate": 0.00013944610778443114,
654
+ "loss": 0.7855,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 0.996614759647935,
659
+ "grad_norm": 0.8356137275695801,
660
+ "learning_rate": 0.00013869760479041918,
661
+ "loss": 0.8292,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.0064996614759647,
666
+ "grad_norm": 0.6590499877929688,
667
+ "learning_rate": 0.0001379491017964072,
668
+ "loss": 0.6858,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.0173324306025728,
673
+ "grad_norm": 1.0389671325683594,
674
+ "learning_rate": 0.00013720059880239523,
675
+ "loss": 0.6097,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.0281651997291807,
680
+ "grad_norm": 0.9596243500709534,
681
+ "learning_rate": 0.00013645209580838324,
682
+ "loss": 0.5676,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.0389979688557887,
687
+ "grad_norm": 1.0831798315048218,
688
+ "learning_rate": 0.00013570359281437125,
689
+ "loss": 0.6106,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.0498307379823968,
694
+ "grad_norm": 0.92978835105896,
695
+ "learning_rate": 0.0001349550898203593,
696
+ "loss": 0.5924,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.0606635071090047,
701
+ "grad_norm": 0.9672062993049622,
702
+ "learning_rate": 0.0001342065868263473,
703
+ "loss": 0.5496,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.0714962762356128,
708
+ "grad_norm": 1.1402652263641357,
709
+ "learning_rate": 0.00013345808383233534,
710
+ "loss": 0.5871,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.0823290453622207,
715
+ "grad_norm": 1.1109035015106201,
716
+ "learning_rate": 0.00013270958083832335,
717
+ "loss": 0.5424,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.0823290453622207,
722
+ "eval_loss": 0.8179630041122437,
723
+ "eval_runtime": 357.2769,
724
+ "eval_samples_per_second": 4.596,
725
+ "eval_steps_per_second": 2.298,
726
+ "step": 1000
727
+ },
728
+ {
729
+ "epoch": 1.0931618144888287,
730
+ "grad_norm": 0.8117087483406067,
731
+ "learning_rate": 0.00013196107784431137,
732
+ "loss": 0.5636,
733
+ "step": 1010
734
+ },
735
+ {
736
+ "epoch": 1.1039945836154368,
737
+ "grad_norm": 0.86320561170578,
738
+ "learning_rate": 0.0001312125748502994,
739
+ "loss": 0.5191,
740
+ "step": 1020
741
+ },
742
+ {
743
+ "epoch": 1.1148273527420447,
744
+ "grad_norm": 1.1274133920669556,
745
+ "learning_rate": 0.00013046407185628744,
746
+ "loss": 0.5891,
747
+ "step": 1030
748
+ },
749
+ {
750
+ "epoch": 1.1256601218686526,
751
+ "grad_norm": 1.0116336345672607,
752
+ "learning_rate": 0.00012971556886227546,
753
+ "loss": 0.5579,
754
+ "step": 1040
755
+ },
756
+ {
757
+ "epoch": 1.1364928909952607,
758
+ "grad_norm": 0.9277855157852173,
759
+ "learning_rate": 0.0001289670658682635,
760
+ "loss": 0.5971,
761
+ "step": 1050
762
+ },
763
+ {
764
+ "epoch": 1.1473256601218687,
765
+ "grad_norm": 1.0700503587722778,
766
+ "learning_rate": 0.0001282185628742515,
767
+ "loss": 0.5815,
768
+ "step": 1060
769
+ },
770
+ {
771
+ "epoch": 1.1581584292484766,
772
+ "grad_norm": 0.9346574544906616,
773
+ "learning_rate": 0.00012747005988023952,
774
+ "loss": 0.5472,
775
+ "step": 1070
776
+ },
777
+ {
778
+ "epoch": 1.1689911983750847,
779
+ "grad_norm": 1.047631025314331,
780
+ "learning_rate": 0.00012672155688622756,
781
+ "loss": 0.5479,
782
+ "step": 1080
783
+ },
784
+ {
785
+ "epoch": 1.1798239675016926,
786
+ "grad_norm": 0.9931487441062927,
787
+ "learning_rate": 0.00012597305389221557,
788
+ "loss": 0.5521,
789
+ "step": 1090
790
+ },
791
+ {
792
+ "epoch": 1.1906567366283005,
793
+ "grad_norm": 0.9764857292175293,
794
+ "learning_rate": 0.0001252245508982036,
795
+ "loss": 0.584,
796
+ "step": 1100
797
+ },
798
+ {
799
+ "epoch": 1.2014895057549086,
800
+ "grad_norm": 1.0661903619766235,
801
+ "learning_rate": 0.00012447604790419162,
802
+ "loss": 0.6101,
803
+ "step": 1110
804
+ },
805
+ {
806
+ "epoch": 1.2123222748815166,
807
+ "grad_norm": 1.0962295532226562,
808
+ "learning_rate": 0.00012372754491017963,
809
+ "loss": 0.6028,
810
+ "step": 1120
811
+ },
812
+ {
813
+ "epoch": 1.2231550440081245,
814
+ "grad_norm": 0.9794766306877136,
815
+ "learning_rate": 0.00012297904191616767,
816
+ "loss": 0.5813,
817
+ "step": 1130
818
+ },
819
+ {
820
+ "epoch": 1.2339878131347326,
821
+ "grad_norm": 0.9556275606155396,
822
+ "learning_rate": 0.0001222305389221557,
823
+ "loss": 0.5662,
824
+ "step": 1140
825
+ },
826
+ {
827
+ "epoch": 1.2448205822613405,
828
+ "grad_norm": 1.1200224161148071,
829
+ "learning_rate": 0.0001214820359281437,
830
+ "loss": 0.5642,
831
+ "step": 1150
832
+ },
833
+ {
834
+ "epoch": 1.2556533513879486,
835
+ "grad_norm": 1.0518434047698975,
836
+ "learning_rate": 0.00012073353293413175,
837
+ "loss": 0.6126,
838
+ "step": 1160
839
+ },
840
+ {
841
+ "epoch": 1.2664861205145566,
842
+ "grad_norm": 1.1709963083267212,
843
+ "learning_rate": 0.00011998502994011977,
844
+ "loss": 0.5189,
845
+ "step": 1170
846
+ },
847
+ {
848
+ "epoch": 1.2773188896411645,
849
+ "grad_norm": 0.8867760896682739,
850
+ "learning_rate": 0.00011923652694610778,
851
+ "loss": 0.6098,
852
+ "step": 1180
853
+ },
854
+ {
855
+ "epoch": 1.2881516587677724,
856
+ "grad_norm": 0.9317127466201782,
857
+ "learning_rate": 0.00011848802395209582,
858
+ "loss": 0.5667,
859
+ "step": 1190
860
+ },
861
+ {
862
+ "epoch": 1.2989844278943805,
863
+ "grad_norm": 1.1382100582122803,
864
+ "learning_rate": 0.00011773952095808385,
865
+ "loss": 0.5756,
866
+ "step": 1200
867
+ },
868
+ {
869
+ "epoch": 1.3098171970209884,
870
+ "grad_norm": 0.9819681644439697,
871
+ "learning_rate": 0.00011699101796407186,
872
+ "loss": 0.5922,
873
+ "step": 1210
874
+ },
875
+ {
876
+ "epoch": 1.3206499661475966,
877
+ "grad_norm": 1.0776174068450928,
878
+ "learning_rate": 0.00011624251497005988,
879
+ "loss": 0.5728,
880
+ "step": 1220
881
+ },
882
+ {
883
+ "epoch": 1.3314827352742045,
884
+ "grad_norm": 1.0137302875518799,
885
+ "learning_rate": 0.0001154940119760479,
886
+ "loss": 0.5603,
887
+ "step": 1230
888
+ },
889
+ {
890
+ "epoch": 1.3423155044008124,
891
+ "grad_norm": 1.1223585605621338,
892
+ "learning_rate": 0.00011474550898203593,
893
+ "loss": 0.5639,
894
+ "step": 1240
895
+ },
896
+ {
897
+ "epoch": 1.3531482735274205,
898
+ "grad_norm": 0.8942229747772217,
899
+ "learning_rate": 0.00011399700598802396,
900
+ "loss": 0.586,
901
+ "step": 1250
902
+ },
903
+ {
904
+ "epoch": 1.3639810426540284,
905
+ "grad_norm": 1.225698709487915,
906
+ "learning_rate": 0.00011324850299401197,
907
+ "loss": 0.563,
908
+ "step": 1260
909
+ },
910
+ {
911
+ "epoch": 1.3748138117806366,
912
+ "grad_norm": 1.159463882446289,
913
+ "learning_rate": 0.00011250000000000001,
914
+ "loss": 0.5898,
915
+ "step": 1270
916
+ },
917
+ {
918
+ "epoch": 1.3856465809072445,
919
+ "grad_norm": 1.0059807300567627,
920
+ "learning_rate": 0.00011175149700598804,
921
+ "loss": 0.6096,
922
+ "step": 1280
923
+ },
924
+ {
925
+ "epoch": 1.3964793500338524,
926
+ "grad_norm": 1.1433062553405762,
927
+ "learning_rate": 0.00011100299401197605,
928
+ "loss": 0.5411,
929
+ "step": 1290
930
+ },
931
+ {
932
+ "epoch": 1.4073121191604603,
933
+ "grad_norm": 1.0282905101776123,
934
+ "learning_rate": 0.00011025449101796407,
935
+ "loss": 0.5928,
936
+ "step": 1300
937
+ },
938
+ {
939
+ "epoch": 1.4181448882870684,
940
+ "grad_norm": 0.8389853835105896,
941
+ "learning_rate": 0.00010950598802395211,
942
+ "loss": 0.5657,
943
+ "step": 1310
944
+ },
945
+ {
946
+ "epoch": 1.4289776574136763,
947
+ "grad_norm": 1.132350206375122,
948
+ "learning_rate": 0.00010875748502994012,
949
+ "loss": 0.6196,
950
+ "step": 1320
951
+ },
952
+ {
953
+ "epoch": 1.4398104265402845,
954
+ "grad_norm": 1.1093621253967285,
955
+ "learning_rate": 0.00010800898203592815,
956
+ "loss": 0.5845,
957
+ "step": 1330
958
+ },
959
+ {
960
+ "epoch": 1.4506431956668924,
961
+ "grad_norm": 1.3198816776275635,
962
+ "learning_rate": 0.00010726047904191616,
963
+ "loss": 0.5711,
964
+ "step": 1340
965
+ },
966
+ {
967
+ "epoch": 1.4614759647935003,
968
+ "grad_norm": 0.8968690037727356,
969
+ "learning_rate": 0.0001065119760479042,
970
+ "loss": 0.6075,
971
+ "step": 1350
972
+ },
973
+ {
974
+ "epoch": 1.4723087339201082,
975
+ "grad_norm": 1.0248963832855225,
976
+ "learning_rate": 0.00010576347305389222,
977
+ "loss": 0.5869,
978
+ "step": 1360
979
+ },
980
+ {
981
+ "epoch": 1.4831415030467163,
982
+ "grad_norm": 1.2115412950515747,
983
+ "learning_rate": 0.00010501497005988024,
984
+ "loss": 0.549,
985
+ "step": 1370
986
+ },
987
+ {
988
+ "epoch": 1.4939742721733242,
989
+ "grad_norm": 1.1320476531982422,
990
+ "learning_rate": 0.00010426646706586826,
991
+ "loss": 0.5661,
992
+ "step": 1380
993
+ },
994
+ {
995
+ "epoch": 1.5048070412999324,
996
+ "grad_norm": 1.0099844932556152,
997
+ "learning_rate": 0.0001035179640718563,
998
+ "loss": 0.5953,
999
+ "step": 1390
1000
+ },
1001
+ {
1002
+ "epoch": 1.5156398104265403,
1003
+ "grad_norm": 0.9809553623199463,
1004
+ "learning_rate": 0.00010276946107784431,
1005
+ "loss": 0.578,
1006
+ "step": 1400
1007
+ },
1008
+ {
1009
+ "epoch": 1.5264725795531482,
1010
+ "grad_norm": 1.4169446229934692,
1011
+ "learning_rate": 0.00010202095808383234,
1012
+ "loss": 0.6173,
1013
+ "step": 1410
1014
+ },
1015
+ {
1016
+ "epoch": 1.537305348679756,
1017
+ "grad_norm": 1.1033852100372314,
1018
+ "learning_rate": 0.00010127245508982038,
1019
+ "loss": 0.5917,
1020
+ "step": 1420
1021
+ },
1022
+ {
1023
+ "epoch": 1.5481381178063642,
1024
+ "grad_norm": 1.1163372993469238,
1025
+ "learning_rate": 0.00010052395209580839,
1026
+ "loss": 0.589,
1027
+ "step": 1430
1028
+ },
1029
+ {
1030
+ "epoch": 1.5589708869329724,
1031
+ "grad_norm": 0.9786676168441772,
1032
+ "learning_rate": 9.977544910179641e-05,
1033
+ "loss": 0.5425,
1034
+ "step": 1440
1035
+ },
1036
+ {
1037
+ "epoch": 1.5698036560595803,
1038
+ "grad_norm": 1.034001111984253,
1039
+ "learning_rate": 9.902694610778444e-05,
1040
+ "loss": 0.5467,
1041
+ "step": 1450
1042
+ },
1043
+ {
1044
+ "epoch": 1.5806364251861882,
1045
+ "grad_norm": 0.8697665929794312,
1046
+ "learning_rate": 9.827844311377245e-05,
1047
+ "loss": 0.5882,
1048
+ "step": 1460
1049
+ },
1050
+ {
1051
+ "epoch": 1.591469194312796,
1052
+ "grad_norm": 1.0091935396194458,
1053
+ "learning_rate": 9.752994011976049e-05,
1054
+ "loss": 0.573,
1055
+ "step": 1470
1056
+ },
1057
+ {
1058
+ "epoch": 1.6023019634394042,
1059
+ "grad_norm": 1.0126501321792603,
1060
+ "learning_rate": 9.678143712574852e-05,
1061
+ "loss": 0.6083,
1062
+ "step": 1480
1063
+ },
1064
+ {
1065
+ "epoch": 1.6131347325660121,
1066
+ "grad_norm": 0.9271785020828247,
1067
+ "learning_rate": 9.603293413173653e-05,
1068
+ "loss": 0.5564,
1069
+ "step": 1490
1070
+ },
1071
+ {
1072
+ "epoch": 1.6239675016926203,
1073
+ "grad_norm": 1.0736253261566162,
1074
+ "learning_rate": 9.528443113772455e-05,
1075
+ "loss": 0.5696,
1076
+ "step": 1500
1077
+ },
1078
+ {
1079
+ "epoch": 1.6239675016926203,
1080
+ "eval_loss": 0.7986094355583191,
1081
+ "eval_runtime": 358.2761,
1082
+ "eval_samples_per_second": 4.583,
1083
+ "eval_steps_per_second": 2.292,
1084
+ "step": 1500
1085
+ },
1086
+ {
1087
+ "epoch": 1.6348002708192282,
1088
+ "grad_norm": 0.9671568870544434,
1089
+ "learning_rate": 9.453592814371258e-05,
1090
+ "loss": 0.5994,
1091
+ "step": 1510
1092
+ },
1093
+ {
1094
+ "epoch": 1.645633039945836,
1095
+ "grad_norm": 0.9636701345443726,
1096
+ "learning_rate": 9.37874251497006e-05,
1097
+ "loss": 0.6096,
1098
+ "step": 1520
1099
+ },
1100
+ {
1101
+ "epoch": 1.656465809072444,
1102
+ "grad_norm": 1.1323844194412231,
1103
+ "learning_rate": 9.303892215568863e-05,
1104
+ "loss": 0.5981,
1105
+ "step": 1530
1106
+ },
1107
+ {
1108
+ "epoch": 1.6672985781990521,
1109
+ "grad_norm": 1.0002387762069702,
1110
+ "learning_rate": 9.229041916167665e-05,
1111
+ "loss": 0.5807,
1112
+ "step": 1540
1113
+ },
1114
+ {
1115
+ "epoch": 1.6781313473256603,
1116
+ "grad_norm": 1.2000038623809814,
1117
+ "learning_rate": 9.154191616766468e-05,
1118
+ "loss": 0.5583,
1119
+ "step": 1550
1120
+ },
1121
+ {
1122
+ "epoch": 1.6889641164522682,
1123
+ "grad_norm": 1.153903841972351,
1124
+ "learning_rate": 9.079341317365269e-05,
1125
+ "loss": 0.6237,
1126
+ "step": 1560
1127
+ },
1128
+ {
1129
+ "epoch": 1.699796885578876,
1130
+ "grad_norm": 1.0791847705841064,
1131
+ "learning_rate": 9.004491017964072e-05,
1132
+ "loss": 0.5457,
1133
+ "step": 1570
1134
+ },
1135
+ {
1136
+ "epoch": 1.710629654705484,
1137
+ "grad_norm": 1.1212618350982666,
1138
+ "learning_rate": 8.929640718562875e-05,
1139
+ "loss": 0.551,
1140
+ "step": 1580
1141
+ },
1142
+ {
1143
+ "epoch": 1.721462423832092,
1144
+ "grad_norm": 1.219691514968872,
1145
+ "learning_rate": 8.854790419161677e-05,
1146
+ "loss": 0.6027,
1147
+ "step": 1590
1148
+ },
1149
+ {
1150
+ "epoch": 1.7322951929587,
1151
+ "grad_norm": 1.066247820854187,
1152
+ "learning_rate": 8.779940119760479e-05,
1153
+ "loss": 0.5739,
1154
+ "step": 1600
1155
+ },
1156
+ {
1157
+ "epoch": 1.7431279620853082,
1158
+ "grad_norm": 1.070609450340271,
1159
+ "learning_rate": 8.705089820359282e-05,
1160
+ "loss": 0.6113,
1161
+ "step": 1610
1162
+ },
1163
+ {
1164
+ "epoch": 1.753960731211916,
1165
+ "grad_norm": 1.377456784248352,
1166
+ "learning_rate": 8.630239520958084e-05,
1167
+ "loss": 0.5648,
1168
+ "step": 1620
1169
+ },
1170
+ {
1171
+ "epoch": 1.764793500338524,
1172
+ "grad_norm": 1.0471181869506836,
1173
+ "learning_rate": 8.555389221556887e-05,
1174
+ "loss": 0.5514,
1175
+ "step": 1630
1176
+ },
1177
+ {
1178
+ "epoch": 1.775626269465132,
1179
+ "grad_norm": 1.2327128648757935,
1180
+ "learning_rate": 8.480538922155688e-05,
1181
+ "loss": 0.6072,
1182
+ "step": 1640
1183
+ },
1184
+ {
1185
+ "epoch": 1.78645903859174,
1186
+ "grad_norm": 1.004497766494751,
1187
+ "learning_rate": 8.405688622754492e-05,
1188
+ "loss": 0.5601,
1189
+ "step": 1650
1190
+ },
1191
+ {
1192
+ "epoch": 1.797291807718348,
1193
+ "grad_norm": 1.2862775325775146,
1194
+ "learning_rate": 8.330838323353294e-05,
1195
+ "loss": 0.6073,
1196
+ "step": 1660
1197
+ },
1198
+ {
1199
+ "epoch": 1.808124576844956,
1200
+ "grad_norm": 1.0752897262573242,
1201
+ "learning_rate": 8.255988023952096e-05,
1202
+ "loss": 0.6079,
1203
+ "step": 1670
1204
+ },
1205
+ {
1206
+ "epoch": 1.818957345971564,
1207
+ "grad_norm": 1.031568169593811,
1208
+ "learning_rate": 8.1811377245509e-05,
1209
+ "loss": 0.5701,
1210
+ "step": 1680
1211
+ },
1212
+ {
1213
+ "epoch": 1.829790115098172,
1214
+ "grad_norm": 1.2067883014678955,
1215
+ "learning_rate": 8.1062874251497e-05,
1216
+ "loss": 0.6024,
1217
+ "step": 1690
1218
+ },
1219
+ {
1220
+ "epoch": 1.8406228842247798,
1221
+ "grad_norm": 1.2873584032058716,
1222
+ "learning_rate": 8.031437125748503e-05,
1223
+ "loss": 0.6033,
1224
+ "step": 1700
1225
+ },
1226
+ {
1227
+ "epoch": 1.851455653351388,
1228
+ "grad_norm": 1.1230562925338745,
1229
+ "learning_rate": 7.956586826347306e-05,
1230
+ "loss": 0.5534,
1231
+ "step": 1710
1232
+ },
1233
+ {
1234
+ "epoch": 1.862288422477996,
1235
+ "grad_norm": 1.275429129600525,
1236
+ "learning_rate": 7.881736526946108e-05,
1237
+ "loss": 0.5483,
1238
+ "step": 1720
1239
+ },
1240
+ {
1241
+ "epoch": 1.873121191604604,
1242
+ "grad_norm": 1.1561681032180786,
1243
+ "learning_rate": 7.806886227544911e-05,
1244
+ "loss": 0.5948,
1245
+ "step": 1730
1246
+ },
1247
+ {
1248
+ "epoch": 1.883953960731212,
1249
+ "grad_norm": 1.0285365581512451,
1250
+ "learning_rate": 7.732035928143713e-05,
1251
+ "loss": 0.5843,
1252
+ "step": 1740
1253
+ },
1254
+ {
1255
+ "epoch": 1.8947867298578198,
1256
+ "grad_norm": 1.257944107055664,
1257
+ "learning_rate": 7.657185628742516e-05,
1258
+ "loss": 0.5672,
1259
+ "step": 1750
1260
+ },
1261
+ {
1262
+ "epoch": 1.9056194989844277,
1263
+ "grad_norm": 1.2069061994552612,
1264
+ "learning_rate": 7.582335329341318e-05,
1265
+ "loss": 0.6312,
1266
+ "step": 1760
1267
+ },
1268
+ {
1269
+ "epoch": 1.9164522681110359,
1270
+ "grad_norm": 0.946007251739502,
1271
+ "learning_rate": 7.50748502994012e-05,
1272
+ "loss": 0.6028,
1273
+ "step": 1770
1274
+ },
1275
+ {
1276
+ "epoch": 1.927285037237644,
1277
+ "grad_norm": 1.3141242265701294,
1278
+ "learning_rate": 7.432634730538922e-05,
1279
+ "loss": 0.5762,
1280
+ "step": 1780
1281
+ },
1282
+ {
1283
+ "epoch": 1.938117806364252,
1284
+ "grad_norm": 0.9737468957901001,
1285
+ "learning_rate": 7.357784431137726e-05,
1286
+ "loss": 0.5637,
1287
+ "step": 1790
1288
+ },
1289
+ {
1290
+ "epoch": 1.9489505754908598,
1291
+ "grad_norm": 1.0719372034072876,
1292
+ "learning_rate": 7.282934131736527e-05,
1293
+ "loss": 0.5685,
1294
+ "step": 1800
1295
+ },
1296
+ {
1297
+ "epoch": 1.9597833446174677,
1298
+ "grad_norm": 0.9777527451515198,
1299
+ "learning_rate": 7.20808383233533e-05,
1300
+ "loss": 0.5823,
1301
+ "step": 1810
1302
+ },
1303
+ {
1304
+ "epoch": 1.9706161137440759,
1305
+ "grad_norm": 1.019610047340393,
1306
+ "learning_rate": 7.133233532934132e-05,
1307
+ "loss": 0.5342,
1308
+ "step": 1820
1309
+ },
1310
+ {
1311
+ "epoch": 1.9814488828706838,
1312
+ "grad_norm": 1.2895872592926025,
1313
+ "learning_rate": 7.058383233532935e-05,
1314
+ "loss": 0.5625,
1315
+ "step": 1830
1316
+ },
1317
+ {
1318
+ "epoch": 1.992281651997292,
1319
+ "grad_norm": 1.1473089456558228,
1320
+ "learning_rate": 6.983532934131737e-05,
1321
+ "loss": 0.5476,
1322
+ "step": 1840
1323
+ },
1324
+ {
1325
+ "epoch": 2.0021665538253215,
1326
+ "grad_norm": 0.9660665392875671,
1327
+ "learning_rate": 6.908682634730538e-05,
1328
+ "loss": 0.5755,
1329
+ "step": 1850
1330
+ },
1331
+ {
1332
+ "epoch": 2.0129993229519294,
1333
+ "grad_norm": 1.2142918109893799,
1334
+ "learning_rate": 6.833832335329342e-05,
1335
+ "loss": 0.3712,
1336
+ "step": 1860
1337
+ },
1338
+ {
1339
+ "epoch": 2.0238320920785378,
1340
+ "grad_norm": 1.4106266498565674,
1341
+ "learning_rate": 6.758982035928145e-05,
1342
+ "loss": 0.3645,
1343
+ "step": 1870
1344
+ },
1345
+ {
1346
+ "epoch": 2.0346648612051457,
1347
+ "grad_norm": 1.2526434659957886,
1348
+ "learning_rate": 6.684131736526946e-05,
1349
+ "loss": 0.3634,
1350
+ "step": 1880
1351
+ },
1352
+ {
1353
+ "epoch": 2.0454976303317536,
1354
+ "grad_norm": 1.2345237731933594,
1355
+ "learning_rate": 6.609281437125749e-05,
1356
+ "loss": 0.3181,
1357
+ "step": 1890
1358
+ },
1359
+ {
1360
+ "epoch": 2.0563303994583615,
1361
+ "grad_norm": 1.1664937734603882,
1362
+ "learning_rate": 6.534431137724551e-05,
1363
+ "loss": 0.3412,
1364
+ "step": 1900
1365
+ },
1366
+ {
1367
+ "epoch": 2.0671631685849694,
1368
+ "grad_norm": 1.3861303329467773,
1369
+ "learning_rate": 6.459580838323354e-05,
1370
+ "loss": 0.348,
1371
+ "step": 1910
1372
+ },
1373
+ {
1374
+ "epoch": 2.0779959377115773,
1375
+ "grad_norm": 1.4952672719955444,
1376
+ "learning_rate": 6.384730538922156e-05,
1377
+ "loss": 0.3193,
1378
+ "step": 1920
1379
+ },
1380
+ {
1381
+ "epoch": 2.0888287068381857,
1382
+ "grad_norm": 1.3477551937103271,
1383
+ "learning_rate": 6.309880239520959e-05,
1384
+ "loss": 0.3499,
1385
+ "step": 1930
1386
+ },
1387
+ {
1388
+ "epoch": 2.0996614759647936,
1389
+ "grad_norm": 1.5403518676757812,
1390
+ "learning_rate": 6.235029940119761e-05,
1391
+ "loss": 0.3742,
1392
+ "step": 1940
1393
+ },
1394
+ {
1395
+ "epoch": 2.1104942450914015,
1396
+ "grad_norm": 1.2402806282043457,
1397
+ "learning_rate": 6.160179640718562e-05,
1398
+ "loss": 0.3575,
1399
+ "step": 1950
1400
+ },
1401
+ {
1402
+ "epoch": 2.1213270142180094,
1403
+ "grad_norm": 1.0961482524871826,
1404
+ "learning_rate": 6.085329341317365e-05,
1405
+ "loss": 0.355,
1406
+ "step": 1960
1407
+ },
1408
+ {
1409
+ "epoch": 2.1321597833446173,
1410
+ "grad_norm": 1.1491034030914307,
1411
+ "learning_rate": 6.010479041916168e-05,
1412
+ "loss": 0.3555,
1413
+ "step": 1970
1414
+ },
1415
+ {
1416
+ "epoch": 2.1429925524712257,
1417
+ "grad_norm": 1.6308982372283936,
1418
+ "learning_rate": 5.9356287425149706e-05,
1419
+ "loss": 0.3408,
1420
+ "step": 1980
1421
+ },
1422
+ {
1423
+ "epoch": 2.1538253215978336,
1424
+ "grad_norm": 1.481628656387329,
1425
+ "learning_rate": 5.8607784431137725e-05,
1426
+ "loss": 0.3814,
1427
+ "step": 1990
1428
+ },
1429
+ {
1430
+ "epoch": 2.1646580907244415,
1431
+ "grad_norm": 1.2989139556884766,
1432
+ "learning_rate": 5.785928143712576e-05,
1433
+ "loss": 0.343,
1434
+ "step": 2000
1435
+ },
1436
+ {
1437
+ "epoch": 2.1646580907244415,
1438
+ "eval_loss": 0.8451017141342163,
1439
+ "eval_runtime": 357.77,
1440
+ "eval_samples_per_second": 4.59,
1441
+ "eval_steps_per_second": 2.295,
1442
+ "step": 2000
1443
+ }
1444
+ ],
1445
+ "logging_steps": 10,
1446
+ "max_steps": 2772,
1447
+ "num_input_tokens_seen": 0,
1448
+ "num_train_epochs": 3,
1449
+ "save_steps": 500,
1450
+ "stateful_callbacks": {
1451
+ "TrainerControl": {
1452
+ "args": {
1453
+ "should_epoch_stop": false,
1454
+ "should_evaluate": false,
1455
+ "should_log": false,
1456
+ "should_save": true,
1457
+ "should_training_stop": false
1458
+ },
1459
+ "attributes": {}
1460
+ }
1461
+ },
1462
+ "total_flos": 7.035689546180198e+17,
1463
+ "train_batch_size": 2,
1464
+ "trial_name": null,
1465
+ "trial_params": null
1466
+ }
checkpoint-2000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb57b3addd13a91af4f53634dd1e6a17845286b1cee5e38cd21da8e2bf179c7f
3
+ size 5304
checkpoint-2500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BioMistral/BioMistral-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:BioMistral/BioMistral-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
checkpoint-2500/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "BioMistral/BioMistral-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "o_proj",
34
+ "v_proj"
35
+ ],
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
checkpoint-2500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf6789607635a0fefb945b61a36bfed3dbf3bf73010588ae8a722b6f9b06b73d
3
+ size 167832240
checkpoint-2500/chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}