Upload folder using huggingface_hub
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- README.md +111 -0
- adapter_config.json +37 -0
- adapter_model.safetensors +3 -0
- checkpoint-1400/README.md +202 -0
- checkpoint-1400/adapter_config.json +37 -0
- checkpoint-1400/adapter_model.safetensors +3 -0
- checkpoint-1400/optimizer.pt +3 -0
- checkpoint-1400/rng_state.pth +3 -0
- checkpoint-1400/scaler.pt +3 -0
- checkpoint-1400/scheduler.pt +3 -0
- checkpoint-1400/special_tokens_map.json +24 -0
- checkpoint-1400/tokenizer.json +0 -0
- checkpoint-1400/tokenizer.model +3 -0
- checkpoint-1400/tokenizer_config.json +0 -0
- checkpoint-1400/trainer_state.json +1135 -0
- checkpoint-1400/training_args.bin +3 -0
- checkpoint-1500/README.md +202 -0
- checkpoint-1500/adapter_config.json +37 -0
- checkpoint-1500/adapter_model.safetensors +3 -0
- checkpoint-1500/optimizer.pt +3 -0
- checkpoint-1500/rng_state.pth +3 -0
- checkpoint-1500/scaler.pt +3 -0
- checkpoint-1500/scheduler.pt +3 -0
- checkpoint-1500/special_tokens_map.json +24 -0
- checkpoint-1500/tokenizer.json +0 -0
- checkpoint-1500/tokenizer.model +3 -0
- checkpoint-1500/tokenizer_config.json +0 -0
- checkpoint-1500/trainer_state.json +1213 -0
- checkpoint-1500/training_args.bin +3 -0
- checkpoint-300/README.md +202 -0
- checkpoint-300/adapter_config.json +37 -0
- checkpoint-300/adapter_model.safetensors +3 -0
- checkpoint-300/optimizer.pt +3 -0
- checkpoint-300/rng_state.pth +3 -0
- checkpoint-300/scaler.pt +3 -0
- checkpoint-300/scheduler.pt +3 -0
- checkpoint-300/special_tokens_map.json +24 -0
- checkpoint-300/tokenizer.json +0 -0
- checkpoint-300/tokenizer.model +3 -0
- checkpoint-300/tokenizer_config.json +0 -0
- checkpoint-300/trainer_state.json +277 -0
- checkpoint-300/training_args.bin +3 -0
- config.json +41 -0
- evaluation_results.json +125 -0
- evaluation_viz/domain_comparison.png +0 -0
- evaluation_viz/metrics_correlation.png +0 -0
- evaluation_viz/overall_metrics.png +0 -0
- evaluation_viz/sample_comparisons.csv +11 -0
- evaluation_viz/survey_metrics.png +0 -0
- generation_config.json +6 -0
README.md
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3
|
3 |
+
|
4 |
+
## Model Description
|
5 |
+
|
6 |
+
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) optimized for generating synthetic survey responses across multiple domains. It has been instruction-tuned using a custom dataset of survey responses, with each response reflecting a specific persona.
|
7 |
+
|
8 |
+
## Training Data
|
9 |
+
|
10 |
+
- **Dataset Size**: ~3,000 examples
|
11 |
+
- **Domains**: 10 domains including healthcare, education, etc.
|
12 |
+
- **Format**: ChatML instruction format with system and user prompts
|
13 |
+
|
14 |
+
## Training Details
|
15 |
+
|
16 |
+
- **Base Model**: [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
|
17 |
+
- **Training Method**: Parameter-Efficient Fine-Tuning with LoRA
|
18 |
+
- **LoRA Parameters**: r=16, alpha=32, dropout=0.05
|
19 |
+
- **Training Setup**:
|
20 |
+
- Batch Size: 8
|
21 |
+
- Learning Rate: 0.0002
|
22 |
+
- Epochs: 5
|
23 |
+
|
24 |
+
|
25 |
+
|
26 |
+
## Usage
|
27 |
+
|
28 |
+
This model is specifically designed for generating synthetic survey responses from different personas. It works best when provided with:
|
29 |
+
1. A detailed persona description
|
30 |
+
2. A specific survey question
|
31 |
+
|
32 |
+
### Python Example
|
33 |
+
|
34 |
+
```python
|
35 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
36 |
+
|
37 |
+
# Load model and tokenizer
|
38 |
+
model = AutoModelForCausalLM.from_pretrained("aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3", device_map="auto", trust_remote_code=True)
|
39 |
+
tokenizer = AutoTokenizer.from_pretrained("aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3", trust_remote_code=True)
|
40 |
+
|
41 |
+
# Define persona and question
|
42 |
+
persona = "A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"
|
43 |
+
question = "How often was your pain well controlled during this hospital stay?"
|
44 |
+
|
45 |
+
# Prepare prompts
|
46 |
+
system_prompt = f"You are embodying the following persona: {{persona}}"
|
47 |
+
user_prompt = f"Survey Question: {{question}}\n\nPlease provide your honest and detailed response to this question."
|
48 |
+
|
49 |
+
# Create message format
|
50 |
+
messages = [
|
51 |
+
{"role": "system", "content": system_prompt},
|
52 |
+
{"role": "user", "content": user_prompt}
|
53 |
+
]
|
54 |
+
|
55 |
+
# Apply chat template
|
56 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
57 |
+
|
58 |
+
# Tokenize
|
59 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(model.device)
|
60 |
+
|
61 |
+
# Generate response
|
62 |
+
import torch
|
63 |
+
with torch.no_grad():
|
64 |
+
output_ids = model.generate(
|
65 |
+
input_ids=input_ids,
|
66 |
+
max_new_tokens=256,
|
67 |
+
temperature=0.7,
|
68 |
+
top_p=0.9,
|
69 |
+
do_sample=True
|
70 |
+
)
|
71 |
+
|
72 |
+
# Decode
|
73 |
+
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
74 |
+
|
75 |
+
# Extract just the generated response
|
76 |
+
response_start = output.find(input_text) + len(input_text)
|
77 |
+
generated_response = output[response_start:].strip()
|
78 |
+
|
79 |
+
print(f"Generated response: {{generated_response}}")
|
80 |
+
```
|
81 |
+
|
82 |
+
### Inference API Example
|
83 |
+
|
84 |
+
```python
|
85 |
+
import requests
|
86 |
+
|
87 |
+
API_URL = "https://api-inference.huggingface.co/models/aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3"
|
88 |
+
headers = {"Authorization": "Bearer YOUR_API_KEY"}
|
89 |
+
|
90 |
+
def query(payload):
|
91 |
+
response = requests.post(API_URL, headers=headers, json=payload)
|
92 |
+
return response.json()
|
93 |
+
|
94 |
+
messages = [
|
95 |
+
{"role": "system", "content": "You are embodying the following persona: A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"},
|
96 |
+
{"role": "user", "content": "Survey Question: How often was your pain well controlled during this hospital stay?\n\nPlease provide your honest and detailed response to this question."}
|
97 |
+
]
|
98 |
+
|
99 |
+
output = query({"inputs": messages})
|
100 |
+
print(output)
|
101 |
+
```
|
102 |
+
|
103 |
+
## Limitations
|
104 |
+
|
105 |
+
- The model is optimized for survey response generation and may not perform well on other tasks
|
106 |
+
- Response quality depends on the clarity and specificity of the persona and question
|
107 |
+
- The model may occasionally generate responses that don't fully align with the given persona
|
108 |
+
|
109 |
+
## License
|
110 |
+
|
111 |
+
This model follows the license of the base model [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3).
|
adapter_config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"alpha_pattern": {},
|
3 |
+
"auto_mapping": null,
|
4 |
+
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
|
5 |
+
"bias": "none",
|
6 |
+
"eva_config": null,
|
7 |
+
"exclude_modules": null,
|
8 |
+
"fan_in_fan_out": false,
|
9 |
+
"inference_mode": true,
|
10 |
+
"init_lora_weights": true,
|
11 |
+
"layer_replication": null,
|
12 |
+
"layers_pattern": null,
|
13 |
+
"layers_to_transform": null,
|
14 |
+
"loftq_config": {},
|
15 |
+
"lora_alpha": 32,
|
16 |
+
"lora_bias": false,
|
17 |
+
"lora_dropout": 0.05,
|
18 |
+
"megatron_config": null,
|
19 |
+
"megatron_core": "megatron.core",
|
20 |
+
"modules_to_save": null,
|
21 |
+
"peft_type": "LORA",
|
22 |
+
"r": 16,
|
23 |
+
"rank_pattern": {},
|
24 |
+
"revision": null,
|
25 |
+
"target_modules": [
|
26 |
+
"o_proj",
|
27 |
+
"up_proj",
|
28 |
+
"gate_proj",
|
29 |
+
"q_proj",
|
30 |
+
"k_proj",
|
31 |
+
"v_proj",
|
32 |
+
"down_proj"
|
33 |
+
],
|
34 |
+
"task_type": "CAUSAL_LM",
|
35 |
+
"use_dora": false,
|
36 |
+
"use_rslora": false
|
37 |
+
}
|
adapter_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:67780d5284cd1c059d99826e0cec6b5f8e1aa17cc818632dbf1f772a9ed7c594
|
3 |
+
size 167832240
|
checkpoint-1400/README.md
ADDED
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mistralai/Mistral-7B-Instruct-v0.3
|
3 |
+
library_name: peft
|
4 |
+
---
|
5 |
+
|
6 |
+
# Model Card for Model ID
|
7 |
+
|
8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
|
14 |
+
### Model Description
|
15 |
+
|
16 |
+
<!-- Provide a longer summary of what this model is. -->
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
- **Developed by:** [More Information Needed]
|
21 |
+
- **Funded by [optional]:** [More Information Needed]
|
22 |
+
- **Shared by [optional]:** [More Information Needed]
|
23 |
+
- **Model type:** [More Information Needed]
|
24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
25 |
+
- **License:** [More Information Needed]
|
26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
+
|
28 |
+
### Model Sources [optional]
|
29 |
+
|
30 |
+
<!-- Provide the basic links for the model. -->
|
31 |
+
|
32 |
+
- **Repository:** [More Information Needed]
|
33 |
+
- **Paper [optional]:** [More Information Needed]
|
34 |
+
- **Demo [optional]:** [More Information Needed]
|
35 |
+
|
36 |
+
## Uses
|
37 |
+
|
38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
+
|
40 |
+
### Direct Use
|
41 |
+
|
42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
+
|
44 |
+
[More Information Needed]
|
45 |
+
|
46 |
+
### Downstream Use [optional]
|
47 |
+
|
48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
+
|
50 |
+
[More Information Needed]
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
+
|
56 |
+
[More Information Needed]
|
57 |
+
|
58 |
+
## Bias, Risks, and Limitations
|
59 |
+
|
60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
+
|
62 |
+
[More Information Needed]
|
63 |
+
|
64 |
+
### Recommendations
|
65 |
+
|
66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
+
|
68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
+
|
70 |
+
## How to Get Started with the Model
|
71 |
+
|
72 |
+
Use the code below to get started with the model.
|
73 |
+
|
74 |
+
[More Information Needed]
|
75 |
+
|
76 |
+
## Training Details
|
77 |
+
|
78 |
+
### Training Data
|
79 |
+
|
80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
+
|
82 |
+
[More Information Needed]
|
83 |
+
|
84 |
+
### Training Procedure
|
85 |
+
|
86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
+
|
88 |
+
#### Preprocessing [optional]
|
89 |
+
|
90 |
+
[More Information Needed]
|
91 |
+
|
92 |
+
|
93 |
+
#### Training Hyperparameters
|
94 |
+
|
95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
+
|
97 |
+
#### Speeds, Sizes, Times [optional]
|
98 |
+
|
99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
+
|
101 |
+
[More Information Needed]
|
102 |
+
|
103 |
+
## Evaluation
|
104 |
+
|
105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
+
|
107 |
+
### Testing Data, Factors & Metrics
|
108 |
+
|
109 |
+
#### Testing Data
|
110 |
+
|
111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
112 |
+
|
113 |
+
[More Information Needed]
|
114 |
+
|
115 |
+
#### Factors
|
116 |
+
|
117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
+
|
119 |
+
[More Information Needed]
|
120 |
+
|
121 |
+
#### Metrics
|
122 |
+
|
123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
+
|
125 |
+
[More Information Needed]
|
126 |
+
|
127 |
+
### Results
|
128 |
+
|
129 |
+
[More Information Needed]
|
130 |
+
|
131 |
+
#### Summary
|
132 |
+
|
133 |
+
|
134 |
+
|
135 |
+
## Model Examination [optional]
|
136 |
+
|
137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
138 |
+
|
139 |
+
[More Information Needed]
|
140 |
+
|
141 |
+
## Environmental Impact
|
142 |
+
|
143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
+
|
145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
+
|
147 |
+
- **Hardware Type:** [More Information Needed]
|
148 |
+
- **Hours used:** [More Information Needed]
|
149 |
+
- **Cloud Provider:** [More Information Needed]
|
150 |
+
- **Compute Region:** [More Information Needed]
|
151 |
+
- **Carbon Emitted:** [More Information Needed]
|
152 |
+
|
153 |
+
## Technical Specifications [optional]
|
154 |
+
|
155 |
+
### Model Architecture and Objective
|
156 |
+
|
157 |
+
[More Information Needed]
|
158 |
+
|
159 |
+
### Compute Infrastructure
|
160 |
+
|
161 |
+
[More Information Needed]
|
162 |
+
|
163 |
+
#### Hardware
|
164 |
+
|
165 |
+
[More Information Needed]
|
166 |
+
|
167 |
+
#### Software
|
168 |
+
|
169 |
+
[More Information Needed]
|
170 |
+
|
171 |
+
## Citation [optional]
|
172 |
+
|
173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
+
|
175 |
+
**BibTeX:**
|
176 |
+
|
177 |
+
[More Information Needed]
|
178 |
+
|
179 |
+
**APA:**
|
180 |
+
|
181 |
+
[More Information Needed]
|
182 |
+
|
183 |
+
## Glossary [optional]
|
184 |
+
|
185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
+
|
187 |
+
[More Information Needed]
|
188 |
+
|
189 |
+
## More Information [optional]
|
190 |
+
|
191 |
+
[More Information Needed]
|
192 |
+
|
193 |
+
## Model Card Authors [optional]
|
194 |
+
|
195 |
+
[More Information Needed]
|
196 |
+
|
197 |
+
## Model Card Contact
|
198 |
+
|
199 |
+
[More Information Needed]
|
200 |
+
### Framework versions
|
201 |
+
|
202 |
+
- PEFT 0.14.0
|
checkpoint-1400/adapter_config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"alpha_pattern": {},
|
3 |
+
"auto_mapping": null,
|
4 |
+
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
|
5 |
+
"bias": "none",
|
6 |
+
"eva_config": null,
|
7 |
+
"exclude_modules": null,
|
8 |
+
"fan_in_fan_out": false,
|
9 |
+
"inference_mode": true,
|
10 |
+
"init_lora_weights": true,
|
11 |
+
"layer_replication": null,
|
12 |
+
"layers_pattern": null,
|
13 |
+
"layers_to_transform": null,
|
14 |
+
"loftq_config": {},
|
15 |
+
"lora_alpha": 32,
|
16 |
+
"lora_bias": false,
|
17 |
+
"lora_dropout": 0.05,
|
18 |
+
"megatron_config": null,
|
19 |
+
"megatron_core": "megatron.core",
|
20 |
+
"modules_to_save": null,
|
21 |
+
"peft_type": "LORA",
|
22 |
+
"r": 16,
|
23 |
+
"rank_pattern": {},
|
24 |
+
"revision": null,
|
25 |
+
"target_modules": [
|
26 |
+
"o_proj",
|
27 |
+
"up_proj",
|
28 |
+
"gate_proj",
|
29 |
+
"q_proj",
|
30 |
+
"k_proj",
|
31 |
+
"v_proj",
|
32 |
+
"down_proj"
|
33 |
+
],
|
34 |
+
"task_type": "CAUSAL_LM",
|
35 |
+
"use_dora": false,
|
36 |
+
"use_rslora": false
|
37 |
+
}
|
checkpoint-1400/adapter_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5c9fe12189951c27bc4ec00f17a9b085692fbb67a7dfe542f41378f72a0120ee
|
3 |
+
size 167832240
|
checkpoint-1400/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a9dc057d2cdebfef0ca8530d4e60f72f82aa4325d54404ed7365bbd8cafee266
|
3 |
+
size 335922386
|
checkpoint-1400/rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:df43d159e9e4ceee25b89f491cc92d93845d8fad5eff3afe150b55e30ed9f075
|
3 |
+
size 14244
|
checkpoint-1400/scaler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0c270be34d4aa9e6eae88f85b9db4fb0a97071ab2b89975397702aa1d9b12293
|
3 |
+
size 988
|
checkpoint-1400/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:969f48d239c448886bc74ccec632df22e96e26849a01486ea41159b4b9a7db28
|
3 |
+
size 1064
|
checkpoint-1400/special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "</s>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<unk>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": false,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
checkpoint-1400/tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-1400/tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
|
3 |
+
size 587404
|
checkpoint-1400/tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-1400/trainer_state.json
ADDED
@@ -0,0 +1,1135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_global_step": 300,
|
3 |
+
"best_metric": 1.0371309518814087,
|
4 |
+
"best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
|
5 |
+
"epoch": 3.9215686274509802,
|
6 |
+
"eval_steps": 100,
|
7 |
+
"global_step": 1400,
|
8 |
+
"is_hyper_param_search": false,
|
9 |
+
"is_local_process_zero": true,
|
10 |
+
"is_world_process_zero": true,
|
11 |
+
"log_history": [
|
12 |
+
{
|
13 |
+
"epoch": 0.028011204481792718,
|
14 |
+
"grad_norm": 5.871906757354736,
|
15 |
+
"learning_rate": 2.962962962962963e-05,
|
16 |
+
"loss": 2.3552,
|
17 |
+
"step": 10
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"epoch": 0.056022408963585436,
|
21 |
+
"grad_norm": 2.389425039291382,
|
22 |
+
"learning_rate": 6.296296296296296e-05,
|
23 |
+
"loss": 1.7094,
|
24 |
+
"step": 20
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"epoch": 0.08403361344537816,
|
28 |
+
"grad_norm": 2.160494804382324,
|
29 |
+
"learning_rate": 0.0001,
|
30 |
+
"loss": 1.4552,
|
31 |
+
"step": 30
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"epoch": 0.11204481792717087,
|
35 |
+
"grad_norm": 2.3963708877563477,
|
36 |
+
"learning_rate": 0.00013703703703703705,
|
37 |
+
"loss": 1.2928,
|
38 |
+
"step": 40
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"epoch": 0.1400560224089636,
|
42 |
+
"grad_norm": 2.192927598953247,
|
43 |
+
"learning_rate": 0.00017407407407407408,
|
44 |
+
"loss": 1.1542,
|
45 |
+
"step": 50
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"epoch": 0.16806722689075632,
|
49 |
+
"grad_norm": 1.6872493028640747,
|
50 |
+
"learning_rate": 0.00019999851776504612,
|
51 |
+
"loss": 1.1238,
|
52 |
+
"step": 60
|
53 |
+
},
|
54 |
+
{
|
55 |
+
"epoch": 0.19607843137254902,
|
56 |
+
"grad_norm": 1.8370214700698853,
|
57 |
+
"learning_rate": 0.00019997216814377874,
|
58 |
+
"loss": 1.1234,
|
59 |
+
"step": 70
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"epoch": 0.22408963585434175,
|
63 |
+
"grad_norm": 1.7727515697479248,
|
64 |
+
"learning_rate": 0.0001999128899579094,
|
65 |
+
"loss": 1.1143,
|
66 |
+
"step": 80
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"epoch": 0.25210084033613445,
|
70 |
+
"grad_norm": 1.5649667978286743,
|
71 |
+
"learning_rate": 0.00019982070273232797,
|
72 |
+
"loss": 1.112,
|
73 |
+
"step": 90
|
74 |
+
},
|
75 |
+
{
|
76 |
+
"epoch": 0.2801120448179272,
|
77 |
+
"grad_norm": 1.5107511281967163,
|
78 |
+
"learning_rate": 0.0001996956368314155,
|
79 |
+
"loss": 1.1154,
|
80 |
+
"step": 100
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"epoch": 0.2801120448179272,
|
84 |
+
"eval_loss": 1.0852699279785156,
|
85 |
+
"eval_runtime": 25.3067,
|
86 |
+
"eval_samples_per_second": 14.107,
|
87 |
+
"eval_steps_per_second": 1.778,
|
88 |
+
"step": 100
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"epoch": 0.3081232492997199,
|
92 |
+
"grad_norm": 1.5419102907180786,
|
93 |
+
"learning_rate": 0.00019953773344904302,
|
94 |
+
"loss": 1.0686,
|
95 |
+
"step": 110
|
96 |
+
},
|
97 |
+
{
|
98 |
+
"epoch": 0.33613445378151263,
|
99 |
+
"grad_norm": 1.5007447004318237,
|
100 |
+
"learning_rate": 0.00019934704459500312,
|
101 |
+
"loss": 1.0849,
|
102 |
+
"step": 120
|
103 |
+
},
|
104 |
+
{
|
105 |
+
"epoch": 0.3641456582633053,
|
106 |
+
"grad_norm": 1.67251455783844,
|
107 |
+
"learning_rate": 0.000199123633077879,
|
108 |
+
"loss": 1.0793,
|
109 |
+
"step": 130
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"epoch": 0.39215686274509803,
|
113 |
+
"grad_norm": 1.4843816757202148,
|
114 |
+
"learning_rate": 0.00019886757248435704,
|
115 |
+
"loss": 1.0122,
|
116 |
+
"step": 140
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"epoch": 0.42016806722689076,
|
120 |
+
"grad_norm": 1.4718296527862549,
|
121 |
+
"learning_rate": 0.00019857894715498873,
|
122 |
+
"loss": 1.0423,
|
123 |
+
"step": 150
|
124 |
+
},
|
125 |
+
{
|
126 |
+
"epoch": 0.4481792717086835,
|
127 |
+
"grad_norm": 1.4704537391662598,
|
128 |
+
"learning_rate": 0.00019825785215641086,
|
129 |
+
"loss": 1.0524,
|
130 |
+
"step": 160
|
131 |
+
},
|
132 |
+
{
|
133 |
+
"epoch": 0.47619047619047616,
|
134 |
+
"grad_norm": 1.4213836193084717,
|
135 |
+
"learning_rate": 0.0001979043932500329,
|
136 |
+
"loss": 1.0373,
|
137 |
+
"step": 170
|
138 |
+
},
|
139 |
+
{
|
140 |
+
"epoch": 0.5042016806722689,
|
141 |
+
"grad_norm": 1.579439401626587,
|
142 |
+
"learning_rate": 0.00019751868685720134,
|
143 |
+
"loss": 1.0539,
|
144 |
+
"step": 180
|
145 |
+
},
|
146 |
+
{
|
147 |
+
"epoch": 0.5322128851540616,
|
148 |
+
"grad_norm": 1.4878956079483032,
|
149 |
+
"learning_rate": 0.00019710086002085335,
|
150 |
+
"loss": 1.0315,
|
151 |
+
"step": 190
|
152 |
+
},
|
153 |
+
{
|
154 |
+
"epoch": 0.5602240896358543,
|
155 |
+
"grad_norm": 1.5998858213424683,
|
156 |
+
"learning_rate": 0.0001966510503636716,
|
157 |
+
"loss": 1.038,
|
158 |
+
"step": 200
|
159 |
+
},
|
160 |
+
{
|
161 |
+
"epoch": 0.5602240896358543,
|
162 |
+
"eval_loss": 1.0578807592391968,
|
163 |
+
"eval_runtime": 25.2375,
|
164 |
+
"eval_samples_per_second": 14.146,
|
165 |
+
"eval_steps_per_second": 1.783,
|
166 |
+
"step": 200
|
167 |
+
},
|
168 |
+
{
|
169 |
+
"epoch": 0.5882352941176471,
|
170 |
+
"grad_norm": 1.5214334726333618,
|
171 |
+
"learning_rate": 0.00019616940604275454,
|
172 |
+
"loss": 1.0515,
|
173 |
+
"step": 210
|
174 |
+
},
|
175 |
+
{
|
176 |
+
"epoch": 0.6162464985994398,
|
177 |
+
"grad_norm": 1.7388910055160522,
|
178 |
+
"learning_rate": 0.00019565608570081672,
|
179 |
+
"loss": 1.07,
|
180 |
+
"step": 220
|
181 |
+
},
|
182 |
+
{
|
183 |
+
"epoch": 0.6442577030812325,
|
184 |
+
"grad_norm": 1.5043017864227295,
|
185 |
+
"learning_rate": 0.00019511125841393558,
|
186 |
+
"loss": 1.0545,
|
187 |
+
"step": 230
|
188 |
+
},
|
189 |
+
{
|
190 |
+
"epoch": 0.6722689075630253,
|
191 |
+
"grad_norm": 1.5737371444702148,
|
192 |
+
"learning_rate": 0.00019453510363586155,
|
193 |
+
"loss": 1.0546,
|
194 |
+
"step": 240
|
195 |
+
},
|
196 |
+
{
|
197 |
+
"epoch": 0.7002801120448179,
|
198 |
+
"grad_norm": 1.5473017692565918,
|
199 |
+
"learning_rate": 0.00019392781113891007,
|
200 |
+
"loss": 1.0306,
|
201 |
+
"step": 250
|
202 |
+
},
|
203 |
+
{
|
204 |
+
"epoch": 0.7282913165266106,
|
205 |
+
"grad_norm": 1.6050424575805664,
|
206 |
+
"learning_rate": 0.00019328958095145496,
|
207 |
+
"loss": 1.0285,
|
208 |
+
"step": 260
|
209 |
+
},
|
210 |
+
{
|
211 |
+
"epoch": 0.7563025210084033,
|
212 |
+
"grad_norm": 1.4448118209838867,
|
213 |
+
"learning_rate": 0.00019262062329204356,
|
214 |
+
"loss": 1.0444,
|
215 |
+
"step": 270
|
216 |
+
},
|
217 |
+
{
|
218 |
+
"epoch": 0.7843137254901961,
|
219 |
+
"grad_norm": 1.7479037046432495,
|
220 |
+
"learning_rate": 0.0001919211585001554,
|
221 |
+
"loss": 1.0341,
|
222 |
+
"step": 280
|
223 |
+
},
|
224 |
+
{
|
225 |
+
"epoch": 0.8123249299719888,
|
226 |
+
"grad_norm": 1.4101306200027466,
|
227 |
+
"learning_rate": 0.00019119141696362759,
|
228 |
+
"loss": 1.0244,
|
229 |
+
"step": 290
|
230 |
+
},
|
231 |
+
{
|
232 |
+
"epoch": 0.8403361344537815,
|
233 |
+
"grad_norm": 1.4832830429077148,
|
234 |
+
"learning_rate": 0.00019043163904277016,
|
235 |
+
"loss": 1.0407,
|
236 |
+
"step": 300
|
237 |
+
},
|
238 |
+
{
|
239 |
+
"epoch": 0.8403361344537815,
|
240 |
+
"eval_loss": 1.0371309518814087,
|
241 |
+
"eval_runtime": 25.2977,
|
242 |
+
"eval_samples_per_second": 14.112,
|
243 |
+
"eval_steps_per_second": 1.779,
|
244 |
+
"step": 300
|
245 |
+
},
|
246 |
+
{
|
247 |
+
"epoch": 0.8683473389355743,
|
248 |
+
"grad_norm": 1.4755833148956299,
|
249 |
+
"learning_rate": 0.00018964207499119686,
|
250 |
+
"loss": 1.0462,
|
251 |
+
"step": 310
|
252 |
+
},
|
253 |
+
{
|
254 |
+
"epoch": 0.896358543417367,
|
255 |
+
"grad_norm": 1.4913283586502075,
|
256 |
+
"learning_rate": 0.0001888229848733972,
|
257 |
+
"loss": 1.0474,
|
258 |
+
"step": 320
|
259 |
+
},
|
260 |
+
{
|
261 |
+
"epoch": 0.9243697478991597,
|
262 |
+
"grad_norm": 1.7317923307418823,
|
263 |
+
"learning_rate": 0.0001879746384790773,
|
264 |
+
"loss": 1.0447,
|
265 |
+
"step": 330
|
266 |
+
},
|
267 |
+
{
|
268 |
+
"epoch": 0.9523809523809523,
|
269 |
+
"grad_norm": 1.5321850776672363,
|
270 |
+
"learning_rate": 0.0001870973152342971,
|
271 |
+
"loss": 1.0225,
|
272 |
+
"step": 340
|
273 |
+
},
|
274 |
+
{
|
275 |
+
"epoch": 0.9803921568627451,
|
276 |
+
"grad_norm": 1.5264012813568115,
|
277 |
+
"learning_rate": 0.00018619130410943386,
|
278 |
+
"loss": 1.0527,
|
279 |
+
"step": 350
|
280 |
+
},
|
281 |
+
{
|
282 |
+
"epoch": 1.0084033613445378,
|
283 |
+
"grad_norm": 1.5999464988708496,
|
284 |
+
"learning_rate": 0.0001852569035240018,
|
285 |
+
"loss": 0.957,
|
286 |
+
"step": 360
|
287 |
+
},
|
288 |
+
{
|
289 |
+
"epoch": 1.0364145658263306,
|
290 |
+
"grad_norm": 1.4261401891708374,
|
291 |
+
"learning_rate": 0.00018429442124835954,
|
292 |
+
"loss": 0.6759,
|
293 |
+
"step": 370
|
294 |
+
},
|
295 |
+
{
|
296 |
+
"epoch": 1.0644257703081232,
|
297 |
+
"grad_norm": 1.6350306272506714,
|
298 |
+
"learning_rate": 0.00018330417430233756,
|
299 |
+
"loss": 0.6721,
|
300 |
+
"step": 380
|
301 |
+
},
|
302 |
+
{
|
303 |
+
"epoch": 1.092436974789916,
|
304 |
+
"grad_norm": 1.4598591327667236,
|
305 |
+
"learning_rate": 0.00018228648885081886,
|
306 |
+
"loss": 0.6695,
|
307 |
+
"step": 390
|
308 |
+
},
|
309 |
+
{
|
310 |
+
"epoch": 1.1204481792717087,
|
311 |
+
"grad_norm": 1.6912418603897095,
|
312 |
+
"learning_rate": 0.00018124170009630787,
|
313 |
+
"loss": 0.6482,
|
314 |
+
"step": 400
|
315 |
+
},
|
316 |
+
{
|
317 |
+
"epoch": 1.1204481792717087,
|
318 |
+
"eval_loss": 1.0602205991744995,
|
319 |
+
"eval_runtime": 25.4667,
|
320 |
+
"eval_samples_per_second": 14.018,
|
321 |
+
"eval_steps_per_second": 1.767,
|
322 |
+
"step": 400
|
323 |
+
},
|
324 |
+
{
|
325 |
+
"epoch": 1.1484593837535013,
|
326 |
+
"grad_norm": 1.6862925291061401,
|
327 |
+
"learning_rate": 0.0001801701521685223,
|
328 |
+
"loss": 0.6921,
|
329 |
+
"step": 410
|
330 |
+
},
|
331 |
+
{
|
332 |
+
"epoch": 1.1764705882352942,
|
333 |
+
"grad_norm": 1.8303461074829102,
|
334 |
+
"learning_rate": 0.00017907219801104458,
|
335 |
+
"loss": 0.6826,
|
336 |
+
"step": 420
|
337 |
+
},
|
338 |
+
{
|
339 |
+
"epoch": 1.2044817927170868,
|
340 |
+
"grad_norm": 2.0269150733947754,
|
341 |
+
"learning_rate": 0.00017794819926507014,
|
342 |
+
"loss": 0.6765,
|
343 |
+
"step": 430
|
344 |
+
},
|
345 |
+
{
|
346 |
+
"epoch": 1.2324929971988796,
|
347 |
+
"grad_norm": 1.7937819957733154,
|
348 |
+
"learning_rate": 0.00017679852615029123,
|
349 |
+
"loss": 0.6884,
|
350 |
+
"step": 440
|
351 |
+
},
|
352 |
+
{
|
353 |
+
"epoch": 1.2605042016806722,
|
354 |
+
"grad_norm": 1.731198787689209,
|
355 |
+
"learning_rate": 0.00017562355734295477,
|
356 |
+
"loss": 0.6783,
|
357 |
+
"step": 450
|
358 |
+
},
|
359 |
+
{
|
360 |
+
"epoch": 1.2885154061624648,
|
361 |
+
"grad_norm": 1.8679757118225098,
|
362 |
+
"learning_rate": 0.00017442367985113504,
|
363 |
+
"loss": 0.6689,
|
364 |
+
"step": 460
|
365 |
+
},
|
366 |
+
{
|
367 |
+
"epoch": 1.3165266106442577,
|
368 |
+
"grad_norm": 1.6848735809326172,
|
369 |
+
"learning_rate": 0.0001731992888872621,
|
370 |
+
"loss": 0.6937,
|
371 |
+
"step": 470
|
372 |
+
},
|
373 |
+
{
|
374 |
+
"epoch": 1.3445378151260505,
|
375 |
+
"grad_norm": 1.5047993659973145,
|
376 |
+
"learning_rate": 0.00017195078773794755,
|
377 |
+
"loss": 0.6919,
|
378 |
+
"step": 480
|
379 |
+
},
|
380 |
+
{
|
381 |
+
"epoch": 1.3725490196078431,
|
382 |
+
"grad_norm": 1.7145918607711792,
|
383 |
+
"learning_rate": 0.00017067858763115132,
|
384 |
+
"loss": 0.6694,
|
385 |
+
"step": 490
|
386 |
+
},
|
387 |
+
{
|
388 |
+
"epoch": 1.4005602240896358,
|
389 |
+
"grad_norm": 1.6695507764816284,
|
390 |
+
"learning_rate": 0.00016938310760073236,
|
391 |
+
"loss": 0.6665,
|
392 |
+
"step": 500
|
393 |
+
},
|
394 |
+
{
|
395 |
+
"epoch": 1.4005602240896358,
|
396 |
+
"eval_loss": 1.0960317850112915,
|
397 |
+
"eval_runtime": 25.678,
|
398 |
+
"eval_samples_per_second": 13.903,
|
399 |
+
"eval_steps_per_second": 1.752,
|
400 |
+
"step": 500
|
401 |
+
},
|
402 |
+
{
|
403 |
+
"epoch": 1.4285714285714286,
|
404 |
+
"grad_norm": 1.9052048921585083,
|
405 |
+
"learning_rate": 0.00016806477434842836,
|
406 |
+
"loss": 0.6711,
|
407 |
+
"step": 510
|
408 |
+
},
|
409 |
+
{
|
410 |
+
"epoch": 1.4565826330532212,
|
411 |
+
"grad_norm": 1.7252157926559448,
|
412 |
+
"learning_rate": 0.00016672402210331,
|
413 |
+
"loss": 0.7007,
|
414 |
+
"step": 520
|
415 |
+
},
|
416 |
+
{
|
417 |
+
"epoch": 1.484593837535014,
|
418 |
+
"grad_norm": 1.7970322370529175,
|
419 |
+
"learning_rate": 0.00016536129247875567,
|
420 |
+
"loss": 0.7164,
|
421 |
+
"step": 530
|
422 |
+
},
|
423 |
+
{
|
424 |
+
"epoch": 1.5126050420168067,
|
425 |
+
"grad_norm": 1.7098366022109985,
|
426 |
+
"learning_rate": 0.00016397703432699404,
|
427 |
+
"loss": 0.7117,
|
428 |
+
"step": 540
|
429 |
+
},
|
430 |
+
{
|
431 |
+
"epoch": 1.5406162464985993,
|
432 |
+
"grad_norm": 1.853852391242981,
|
433 |
+
"learning_rate": 0.00016257170359126237,
|
434 |
+
"loss": 0.7123,
|
435 |
+
"step": 550
|
436 |
+
},
|
437 |
+
{
|
438 |
+
"epoch": 1.5686274509803921,
|
439 |
+
"grad_norm": 2.119030714035034,
|
440 |
+
"learning_rate": 0.000161145763155629,
|
441 |
+
"loss": 0.6952,
|
442 |
+
"step": 560
|
443 |
+
},
|
444 |
+
{
|
445 |
+
"epoch": 1.596638655462185,
|
446 |
+
"grad_norm": 1.7882400751113892,
|
447 |
+
"learning_rate": 0.00015969968269253013,
|
448 |
+
"loss": 0.6957,
|
449 |
+
"step": 570
|
450 |
+
},
|
451 |
+
{
|
452 |
+
"epoch": 1.6246498599439776,
|
453 |
+
"grad_norm": 2.0184831619262695,
|
454 |
+
"learning_rate": 0.00015823393850806998,
|
455 |
+
"loss": 0.6972,
|
456 |
+
"step": 580
|
457 |
+
},
|
458 |
+
{
|
459 |
+
"epoch": 1.6526610644257702,
|
460 |
+
"grad_norm": 1.5946452617645264,
|
461 |
+
"learning_rate": 0.00015674901338513672,
|
462 |
+
"loss": 0.6888,
|
463 |
+
"step": 590
|
464 |
+
},
|
465 |
+
{
|
466 |
+
"epoch": 1.680672268907563,
|
467 |
+
"grad_norm": 1.7266991138458252,
|
468 |
+
"learning_rate": 0.0001552453964243845,
|
469 |
+
"loss": 0.7029,
|
470 |
+
"step": 600
|
471 |
+
},
|
472 |
+
{
|
473 |
+
"epoch": 1.680672268907563,
|
474 |
+
"eval_loss": 1.0715969800949097,
|
475 |
+
"eval_runtime": 25.7893,
|
476 |
+
"eval_samples_per_second": 13.843,
|
477 |
+
"eval_steps_per_second": 1.745,
|
478 |
+
"step": 600
|
479 |
+
},
|
480 |
+
{
|
481 |
+
"epoch": 1.708683473389356,
|
482 |
+
"grad_norm": 1.7430881261825562,
|
483 |
+
"learning_rate": 0.000153723582883135,
|
484 |
+
"loss": 0.7032,
|
485 |
+
"step": 610
|
486 |
+
},
|
487 |
+
{
|
488 |
+
"epoch": 1.7366946778711485,
|
489 |
+
"grad_norm": 1.88009774684906,
|
490 |
+
"learning_rate": 0.00015218407401225057,
|
491 |
+
"loss": 0.6902,
|
492 |
+
"step": 620
|
493 |
+
},
|
494 |
+
{
|
495 |
+
"epoch": 1.7647058823529411,
|
496 |
+
"grad_norm": 1.7102521657943726,
|
497 |
+
"learning_rate": 0.000150627376891034,
|
498 |
+
"loss": 0.6866,
|
499 |
+
"step": 630
|
500 |
+
},
|
501 |
+
{
|
502 |
+
"epoch": 1.7927170868347337,
|
503 |
+
"grad_norm": 1.854864239692688,
|
504 |
+
"learning_rate": 0.0001490540042602077,
|
505 |
+
"loss": 0.6944,
|
506 |
+
"step": 640
|
507 |
+
},
|
508 |
+
{
|
509 |
+
"epoch": 1.8207282913165266,
|
510 |
+
"grad_norm": 1.61717689037323,
|
511 |
+
"learning_rate": 0.0001474644743530288,
|
512 |
+
"loss": 0.7043,
|
513 |
+
"step": 650
|
514 |
+
},
|
515 |
+
{
|
516 |
+
"epoch": 1.8487394957983194,
|
517 |
+
"grad_norm": 1.7749794721603394,
|
518 |
+
"learning_rate": 0.00014585931072459474,
|
519 |
+
"loss": 0.7008,
|
520 |
+
"step": 660
|
521 |
+
},
|
522 |
+
{
|
523 |
+
"epoch": 1.876750700280112,
|
524 |
+
"grad_norm": 1.7132004499435425,
|
525 |
+
"learning_rate": 0.00014423904207939615,
|
526 |
+
"loss": 0.7167,
|
527 |
+
"step": 670
|
528 |
+
},
|
529 |
+
{
|
530 |
+
"epoch": 1.9047619047619047,
|
531 |
+
"grad_norm": 1.784264326095581,
|
532 |
+
"learning_rate": 0.0001426042020971736,
|
533 |
+
"loss": 0.7203,
|
534 |
+
"step": 680
|
535 |
+
},
|
536 |
+
{
|
537 |
+
"epoch": 1.9327731092436975,
|
538 |
+
"grad_norm": 1.8548870086669922,
|
539 |
+
"learning_rate": 0.00014095532925713553,
|
540 |
+
"loss": 0.7198,
|
541 |
+
"step": 690
|
542 |
+
},
|
543 |
+
{
|
544 |
+
"epoch": 1.9607843137254903,
|
545 |
+
"grad_norm": 1.4964855909347534,
|
546 |
+
"learning_rate": 0.00013929296666059544,
|
547 |
+
"loss": 0.7053,
|
548 |
+
"step": 700
|
549 |
+
},
|
550 |
+
{
|
551 |
+
"epoch": 1.9607843137254903,
|
552 |
+
"eval_loss": 1.0620192289352417,
|
553 |
+
"eval_runtime": 25.7481,
|
554 |
+
"eval_samples_per_second": 13.865,
|
555 |
+
"eval_steps_per_second": 1.748,
|
556 |
+
"step": 700
|
557 |
+
},
|
558 |
+
{
|
559 |
+
"epoch": 1.988795518207283,
|
560 |
+
"grad_norm": 1.7073538303375244,
|
561 |
+
"learning_rate": 0.00013761766185208663,
|
562 |
+
"loss": 0.7338,
|
563 |
+
"step": 710
|
564 |
+
},
|
565 |
+
{
|
566 |
+
"epoch": 2.0168067226890756,
|
567 |
+
"grad_norm": 1.4009623527526855,
|
568 |
+
"learning_rate": 0.00013592996663901354,
|
569 |
+
"loss": 0.4904,
|
570 |
+
"step": 720
|
571 |
+
},
|
572 |
+
{
|
573 |
+
"epoch": 2.044817927170868,
|
574 |
+
"grad_norm": 1.6378936767578125,
|
575 |
+
"learning_rate": 0.00013423043690989874,
|
576 |
+
"loss": 0.3822,
|
577 |
+
"step": 730
|
578 |
+
},
|
579 |
+
{
|
580 |
+
"epoch": 2.0728291316526612,
|
581 |
+
"grad_norm": 1.8590551614761353,
|
582 |
+
"learning_rate": 0.00013251963245128603,
|
583 |
+
"loss": 0.3615,
|
584 |
+
"step": 740
|
585 |
+
},
|
586 |
+
{
|
587 |
+
"epoch": 2.100840336134454,
|
588 |
+
"grad_norm": 1.7983025312423706,
|
589 |
+
"learning_rate": 0.00013079811676335935,
|
590 |
+
"loss": 0.356,
|
591 |
+
"step": 750
|
592 |
+
},
|
593 |
+
{
|
594 |
+
"epoch": 2.1288515406162465,
|
595 |
+
"grad_norm": 2.0292270183563232,
|
596 |
+
"learning_rate": 0.00012906645687433857,
|
597 |
+
"loss": 0.3672,
|
598 |
+
"step": 760
|
599 |
+
},
|
600 |
+
{
|
601 |
+
"epoch": 2.156862745098039,
|
602 |
+
"grad_norm": 1.774778127670288,
|
603 |
+
"learning_rate": 0.00012732522315371338,
|
604 |
+
"loss": 0.3758,
|
605 |
+
"step": 770
|
606 |
+
},
|
607 |
+
{
|
608 |
+
"epoch": 2.184873949579832,
|
609 |
+
"grad_norm": 1.9023045301437378,
|
610 |
+
"learning_rate": 0.0001255749891243764,
|
611 |
+
"loss": 0.3674,
|
612 |
+
"step": 780
|
613 |
+
},
|
614 |
+
{
|
615 |
+
"epoch": 2.212885154061625,
|
616 |
+
"grad_norm": 1.4395744800567627,
|
617 |
+
"learning_rate": 0.00012381633127371783,
|
618 |
+
"loss": 0.3723,
|
619 |
+
"step": 790
|
620 |
+
},
|
621 |
+
{
|
622 |
+
"epoch": 2.2408963585434174,
|
623 |
+
"grad_norm": 1.6131172180175781,
|
624 |
+
"learning_rate": 0.00012204982886374348,
|
625 |
+
"loss": 0.3699,
|
626 |
+
"step": 800
|
627 |
+
},
|
628 |
+
{
|
629 |
+
"epoch": 2.2408963585434174,
|
630 |
+
"eval_loss": 1.2616853713989258,
|
631 |
+
"eval_runtime": 26.08,
|
632 |
+
"eval_samples_per_second": 13.689,
|
633 |
+
"eval_steps_per_second": 1.725,
|
634 |
+
"step": 800
|
635 |
+
},
|
636 |
+
{
|
637 |
+
"epoch": 2.26890756302521,
|
638 |
+
"grad_norm": 1.782162070274353,
|
639 |
+
"learning_rate": 0.00012027606374027915,
|
640 |
+
"loss": 0.3834,
|
641 |
+
"step": 810
|
642 |
+
},
|
643 |
+
{
|
644 |
+
"epoch": 2.2969187675070026,
|
645 |
+
"grad_norm": 1.7493244409561157,
|
646 |
+
"learning_rate": 0.00011849562014132367,
|
647 |
+
"loss": 0.3722,
|
648 |
+
"step": 820
|
649 |
+
},
|
650 |
+
{
|
651 |
+
"epoch": 2.3249299719887957,
|
652 |
+
"grad_norm": 1.5457279682159424,
|
653 |
+
"learning_rate": 0.00011670908450461421,
|
654 |
+
"loss": 0.3836,
|
655 |
+
"step": 830
|
656 |
+
},
|
657 |
+
{
|
658 |
+
"epoch": 2.3529411764705883,
|
659 |
+
"grad_norm": 1.914865255355835,
|
660 |
+
"learning_rate": 0.00011491704527446705,
|
661 |
+
"loss": 0.3929,
|
662 |
+
"step": 840
|
663 |
+
},
|
664 |
+
{
|
665 |
+
"epoch": 2.380952380952381,
|
666 |
+
"grad_norm": 1.9673585891723633,
|
667 |
+
"learning_rate": 0.00011312009270795729,
|
668 |
+
"loss": 0.3736,
|
669 |
+
"step": 850
|
670 |
+
},
|
671 |
+
{
|
672 |
+
"epoch": 2.4089635854341735,
|
673 |
+
"grad_norm": 1.508853554725647,
|
674 |
+
"learning_rate": 0.00011131881868050152,
|
675 |
+
"loss": 0.3818,
|
676 |
+
"step": 860
|
677 |
+
},
|
678 |
+
{
|
679 |
+
"epoch": 2.4369747899159666,
|
680 |
+
"grad_norm": 1.633651852607727,
|
681 |
+
"learning_rate": 0.00010951381649090766,
|
682 |
+
"loss": 0.3706,
|
683 |
+
"step": 870
|
684 |
+
},
|
685 |
+
{
|
686 |
+
"epoch": 2.4649859943977592,
|
687 |
+
"grad_norm": 1.7917840480804443,
|
688 |
+
"learning_rate": 0.00010770568066595556,
|
689 |
+
"loss": 0.372,
|
690 |
+
"step": 880
|
691 |
+
},
|
692 |
+
{
|
693 |
+
"epoch": 2.492997198879552,
|
694 |
+
"grad_norm": 1.7907308340072632,
|
695 |
+
"learning_rate": 0.00010589500676457358,
|
696 |
+
"loss": 0.392,
|
697 |
+
"step": 890
|
698 |
+
},
|
699 |
+
{
|
700 |
+
"epoch": 2.5210084033613445,
|
701 |
+
"grad_norm": 1.6397713422775269,
|
702 |
+
"learning_rate": 0.00010408239118167482,
|
703 |
+
"loss": 0.3866,
|
704 |
+
"step": 900
|
705 |
+
},
|
706 |
+
{
|
707 |
+
"epoch": 2.5210084033613445,
|
708 |
+
"eval_loss": 1.254116177558899,
|
709 |
+
"eval_runtime": 26.2156,
|
710 |
+
"eval_samples_per_second": 13.618,
|
711 |
+
"eval_steps_per_second": 1.717,
|
712 |
+
"step": 900
|
713 |
+
},
|
714 |
+
{
|
715 |
+
"epoch": 2.549019607843137,
|
716 |
+
"grad_norm": 1.613955020904541,
|
717 |
+
"learning_rate": 0.00010226843095171827,
|
718 |
+
"loss": 0.3973,
|
719 |
+
"step": 910
|
720 |
+
},
|
721 |
+
{
|
722 |
+
"epoch": 2.5770308123249297,
|
723 |
+
"grad_norm": 1.7094041109085083,
|
724 |
+
"learning_rate": 0.00010045372355205915,
|
725 |
+
"loss": 0.3697,
|
726 |
+
"step": 920
|
727 |
+
},
|
728 |
+
{
|
729 |
+
"epoch": 2.6050420168067228,
|
730 |
+
"grad_norm": 1.6542613506317139,
|
731 |
+
"learning_rate": 9.86388667061534e-05,
|
732 |
+
"loss": 0.3773,
|
733 |
+
"step": 930
|
734 |
+
},
|
735 |
+
{
|
736 |
+
"epoch": 2.6330532212885154,
|
737 |
+
"grad_norm": 1.57974374294281,
|
738 |
+
"learning_rate": 9.68244581866812e-05,
|
739 |
+
"loss": 0.381,
|
740 |
+
"step": 940
|
741 |
+
},
|
742 |
+
{
|
743 |
+
"epoch": 2.661064425770308,
|
744 |
+
"grad_norm": 1.8591638803482056,
|
745 |
+
"learning_rate": 9.501109561865415e-05,
|
746 |
+
"loss": 0.3722,
|
747 |
+
"step": 950
|
748 |
+
},
|
749 |
+
{
|
750 |
+
"epoch": 2.689075630252101,
|
751 |
+
"grad_norm": 1.8032310009002686,
|
752 |
+
"learning_rate": 9.319937628257121e-05,
|
753 |
+
"loss": 0.3812,
|
754 |
+
"step": 960
|
755 |
+
},
|
756 |
+
{
|
757 |
+
"epoch": 2.7170868347338937,
|
758 |
+
"grad_norm": 1.6205816268920898,
|
759 |
+
"learning_rate": 9.1389896917688e-05,
|
760 |
+
"loss": 0.358,
|
761 |
+
"step": 970
|
762 |
+
},
|
763 |
+
{
|
764 |
+
"epoch": 2.7450980392156863,
|
765 |
+
"grad_norm": 1.5483547449111938,
|
766 |
+
"learning_rate": 8.958325352546433e-05,
|
767 |
+
"loss": 0.3829,
|
768 |
+
"step": 980
|
769 |
+
},
|
770 |
+
{
|
771 |
+
"epoch": 2.773109243697479,
|
772 |
+
"grad_norm": 1.7492879629135132,
|
773 |
+
"learning_rate": 8.778004117325508e-05,
|
774 |
+
"loss": 0.3832,
|
775 |
+
"step": 990
|
776 |
+
},
|
777 |
+
{
|
778 |
+
"epoch": 2.8011204481792715,
|
779 |
+
"grad_norm": 1.752061128616333,
|
780 |
+
"learning_rate": 8.598085379830831e-05,
|
781 |
+
"loss": 0.3708,
|
782 |
+
"step": 1000
|
783 |
+
},
|
784 |
+
{
|
785 |
+
"epoch": 2.8011204481792715,
|
786 |
+
"eval_loss": 1.242302417755127,
|
787 |
+
"eval_runtime": 26.0048,
|
788 |
+
"eval_samples_per_second": 13.728,
|
789 |
+
"eval_steps_per_second": 1.73,
|
790 |
+
"step": 1000
|
791 |
+
},
|
792 |
+
{
|
793 |
+
"epoch": 2.8291316526610646,
|
794 |
+
"grad_norm": 1.7680628299713135,
|
795 |
+
"learning_rate": 8.418628401213588e-05,
|
796 |
+
"loss": 0.3916,
|
797 |
+
"step": 1010
|
798 |
+
},
|
799 |
+
{
|
800 |
+
"epoch": 2.857142857142857,
|
801 |
+
"grad_norm": 2.042778253555298,
|
802 |
+
"learning_rate": 8.239692290532066e-05,
|
803 |
+
"loss": 0.375,
|
804 |
+
"step": 1020
|
805 |
+
},
|
806 |
+
{
|
807 |
+
"epoch": 2.88515406162465,
|
808 |
+
"grad_norm": 1.5636996030807495,
|
809 |
+
"learning_rate": 8.061335985282463e-05,
|
810 |
+
"loss": 0.371,
|
811 |
+
"step": 1030
|
812 |
+
},
|
813 |
+
{
|
814 |
+
"epoch": 2.9131652661064424,
|
815 |
+
"grad_norm": 2.1579694747924805,
|
816 |
+
"learning_rate": 7.883618231986223e-05,
|
817 |
+
"loss": 0.3728,
|
818 |
+
"step": 1040
|
819 |
+
},
|
820 |
+
{
|
821 |
+
"epoch": 2.9411764705882355,
|
822 |
+
"grad_norm": 1.8203998804092407,
|
823 |
+
"learning_rate": 7.706597566840258e-05,
|
824 |
+
"loss": 0.3752,
|
825 |
+
"step": 1050
|
826 |
+
},
|
827 |
+
{
|
828 |
+
"epoch": 2.969187675070028,
|
829 |
+
"grad_norm": 1.7312395572662354,
|
830 |
+
"learning_rate": 7.53033229643646e-05,
|
831 |
+
"loss": 0.3789,
|
832 |
+
"step": 1060
|
833 |
+
},
|
834 |
+
{
|
835 |
+
"epoch": 2.9971988795518207,
|
836 |
+
"grad_norm": 1.616680383682251,
|
837 |
+
"learning_rate": 7.354880478556818e-05,
|
838 |
+
"loss": 0.3682,
|
839 |
+
"step": 1070
|
840 |
+
},
|
841 |
+
{
|
842 |
+
"epoch": 3.0252100840336134,
|
843 |
+
"grad_norm": 1.6187553405761719,
|
844 |
+
"learning_rate": 7.18029990305052e-05,
|
845 |
+
"loss": 0.2418,
|
846 |
+
"step": 1080
|
847 |
+
},
|
848 |
+
{
|
849 |
+
"epoch": 3.053221288515406,
|
850 |
+
"grad_norm": 1.2909393310546875,
|
851 |
+
"learning_rate": 7.006648072799272e-05,
|
852 |
+
"loss": 0.2199,
|
853 |
+
"step": 1090
|
854 |
+
},
|
855 |
+
{
|
856 |
+
"epoch": 3.081232492997199,
|
857 |
+
"grad_norm": 1.4550460577011108,
|
858 |
+
"learning_rate": 6.833982184777165e-05,
|
859 |
+
"loss": 0.2417,
|
860 |
+
"step": 1100
|
861 |
+
},
|
862 |
+
{
|
863 |
+
"epoch": 3.081232492997199,
|
864 |
+
"eval_loss": 1.3649441003799438,
|
865 |
+
"eval_runtime": 25.9266,
|
866 |
+
"eval_samples_per_second": 13.77,
|
867 |
+
"eval_steps_per_second": 1.736,
|
868 |
+
"step": 1100
|
869 |
+
},
|
870 |
+
{
|
871 |
+
"epoch": 3.1092436974789917,
|
872 |
+
"grad_norm": 1.4254740476608276,
|
873 |
+
"learning_rate": 6.662359111211268e-05,
|
874 |
+
"loss": 0.2217,
|
875 |
+
"step": 1110
|
876 |
+
},
|
877 |
+
{
|
878 |
+
"epoch": 3.1372549019607843,
|
879 |
+
"grad_norm": 1.446631908416748,
|
880 |
+
"learning_rate": 6.491835380849238e-05,
|
881 |
+
"loss": 0.2312,
|
882 |
+
"step": 1120
|
883 |
+
},
|
884 |
+
{
|
885 |
+
"epoch": 3.165266106442577,
|
886 |
+
"grad_norm": 1.636902093887329,
|
887 |
+
"learning_rate": 6.322467160340011e-05,
|
888 |
+
"loss": 0.2215,
|
889 |
+
"step": 1130
|
890 |
+
},
|
891 |
+
{
|
892 |
+
"epoch": 3.19327731092437,
|
893 |
+
"grad_norm": 1.3956204652786255,
|
894 |
+
"learning_rate": 6.154310235733785e-05,
|
895 |
+
"loss": 0.2286,
|
896 |
+
"step": 1140
|
897 |
+
},
|
898 |
+
{
|
899 |
+
"epoch": 3.2212885154061626,
|
900 |
+
"grad_norm": 1.1625617742538452,
|
901 |
+
"learning_rate": 5.9874199941073636e-05,
|
902 |
+
"loss": 0.2402,
|
903 |
+
"step": 1150
|
904 |
+
},
|
905 |
+
{
|
906 |
+
"epoch": 3.249299719887955,
|
907 |
+
"grad_norm": 1.3581691980361938,
|
908 |
+
"learning_rate": 5.821851405320903e-05,
|
909 |
+
"loss": 0.2313,
|
910 |
+
"step": 1160
|
911 |
+
},
|
912 |
+
{
|
913 |
+
"epoch": 3.277310924369748,
|
914 |
+
"grad_norm": 1.2999801635742188,
|
915 |
+
"learning_rate": 5.657659003912104e-05,
|
916 |
+
"loss": 0.2251,
|
917 |
+
"step": 1170
|
918 |
+
},
|
919 |
+
{
|
920 |
+
"epoch": 3.3053221288515404,
|
921 |
+
"grad_norm": 1.4706475734710693,
|
922 |
+
"learning_rate": 5.494896871133743e-05,
|
923 |
+
"loss": 0.2213,
|
924 |
+
"step": 1180
|
925 |
+
},
|
926 |
+
{
|
927 |
+
"epoch": 3.3333333333333335,
|
928 |
+
"grad_norm": 1.4545742273330688,
|
929 |
+
"learning_rate": 5.3336186171405764e-05,
|
930 |
+
"loss": 0.2333,
|
931 |
+
"step": 1190
|
932 |
+
},
|
933 |
+
{
|
934 |
+
"epoch": 3.361344537815126,
|
935 |
+
"grad_norm": 1.2404979467391968,
|
936 |
+
"learning_rate": 5.173877363331311e-05,
|
937 |
+
"loss": 0.2211,
|
938 |
+
"step": 1200
|
939 |
+
},
|
940 |
+
{
|
941 |
+
"epoch": 3.361344537815126,
|
942 |
+
"eval_loss": 1.4273622035980225,
|
943 |
+
"eval_runtime": 26.0864,
|
944 |
+
"eval_samples_per_second": 13.685,
|
945 |
+
"eval_steps_per_second": 1.725,
|
946 |
+
"step": 1200
|
947 |
+
},
|
948 |
+
{
|
949 |
+
"epoch": 3.3893557422969187,
|
950 |
+
"grad_norm": 1.3979793787002563,
|
951 |
+
"learning_rate": 5.0157257248516896e-05,
|
952 |
+
"loss": 0.2291,
|
953 |
+
"step": 1210
|
954 |
+
},
|
955 |
+
{
|
956 |
+
"epoch": 3.4173669467787113,
|
957 |
+
"grad_norm": 1.7199064493179321,
|
958 |
+
"learning_rate": 4.859215793264198e-05,
|
959 |
+
"loss": 0.229,
|
960 |
+
"step": 1220
|
961 |
+
},
|
962 |
+
{
|
963 |
+
"epoch": 3.4453781512605044,
|
964 |
+
"grad_norm": 1.4441235065460205,
|
965 |
+
"learning_rate": 4.7043991193903244e-05,
|
966 |
+
"loss": 0.2329,
|
967 |
+
"step": 1230
|
968 |
+
},
|
969 |
+
{
|
970 |
+
"epoch": 3.473389355742297,
|
971 |
+
"grad_norm": 1.363365650177002,
|
972 |
+
"learning_rate": 4.5513266963308654e-05,
|
973 |
+
"loss": 0.2308,
|
974 |
+
"step": 1240
|
975 |
+
},
|
976 |
+
{
|
977 |
+
"epoch": 3.5014005602240896,
|
978 |
+
"grad_norm": 1.415418267250061,
|
979 |
+
"learning_rate": 4.400048942669962e-05,
|
980 |
+
"loss": 0.2267,
|
981 |
+
"step": 1250
|
982 |
+
},
|
983 |
+
{
|
984 |
+
"epoch": 3.5294117647058822,
|
985 |
+
"grad_norm": 1.3714072704315186,
|
986 |
+
"learning_rate": 4.25061568586836e-05,
|
987 |
+
"loss": 0.2227,
|
988 |
+
"step": 1260
|
989 |
+
},
|
990 |
+
{
|
991 |
+
"epoch": 3.557422969187675,
|
992 |
+
"grad_norm": 1.2111363410949707,
|
993 |
+
"learning_rate": 4.10307614585139e-05,
|
994 |
+
"loss": 0.2322,
|
995 |
+
"step": 1270
|
996 |
+
},
|
997 |
+
{
|
998 |
+
"epoch": 3.585434173669468,
|
999 |
+
"grad_norm": 1.3968279361724854,
|
1000 |
+
"learning_rate": 3.957478918797013e-05,
|
1001 |
+
"loss": 0.225,
|
1002 |
+
"step": 1280
|
1003 |
+
},
|
1004 |
+
{
|
1005 |
+
"epoch": 3.6134453781512605,
|
1006 |
+
"grad_norm": 1.4936352968215942,
|
1007 |
+
"learning_rate": 3.8138719611293595e-05,
|
1008 |
+
"loss": 0.2191,
|
1009 |
+
"step": 1290
|
1010 |
+
},
|
1011 |
+
{
|
1012 |
+
"epoch": 3.641456582633053,
|
1013 |
+
"grad_norm": 1.3193854093551636,
|
1014 |
+
"learning_rate": 3.6723025737229744e-05,
|
1015 |
+
"loss": 0.2251,
|
1016 |
+
"step": 1300
|
1017 |
+
},
|
1018 |
+
{
|
1019 |
+
"epoch": 3.641456582633053,
|
1020 |
+
"eval_loss": 1.4654343128204346,
|
1021 |
+
"eval_runtime": 26.0893,
|
1022 |
+
"eval_samples_per_second": 13.684,
|
1023 |
+
"eval_steps_per_second": 1.725,
|
1024 |
+
"step": 1300
|
1025 |
+
},
|
1026 |
+
{
|
1027 |
+
"epoch": 3.669467787114846,
|
1028 |
+
"grad_norm": 1.2740771770477295,
|
1029 |
+
"learning_rate": 3.532817386322964e-05,
|
1030 |
+
"loss": 0.2235,
|
1031 |
+
"step": 1310
|
1032 |
+
},
|
1033 |
+
{
|
1034 |
+
"epoch": 3.697478991596639,
|
1035 |
+
"grad_norm": 1.1999931335449219,
|
1036 |
+
"learning_rate": 3.395462342186244e-05,
|
1037 |
+
"loss": 0.2244,
|
1038 |
+
"step": 1320
|
1039 |
+
},
|
1040 |
+
{
|
1041 |
+
"epoch": 3.7254901960784315,
|
1042 |
+
"grad_norm": 1.4531569480895996,
|
1043 |
+
"learning_rate": 3.2602826829488296e-05,
|
1044 |
+
"loss": 0.2218,
|
1045 |
+
"step": 1330
|
1046 |
+
},
|
1047 |
+
{
|
1048 |
+
"epoch": 3.753501400560224,
|
1049 |
+
"grad_norm": 1.5755056142807007,
|
1050 |
+
"learning_rate": 3.127322933724317e-05,
|
1051 |
+
"loss": 0.2331,
|
1052 |
+
"step": 1340
|
1053 |
+
},
|
1054 |
+
{
|
1055 |
+
"epoch": 3.7815126050420167,
|
1056 |
+
"grad_norm": 1.2473876476287842,
|
1057 |
+
"learning_rate": 2.9966268884382808e-05,
|
1058 |
+
"loss": 0.2171,
|
1059 |
+
"step": 1350
|
1060 |
+
},
|
1061 |
+
{
|
1062 |
+
"epoch": 3.8095238095238093,
|
1063 |
+
"grad_norm": 1.3828644752502441,
|
1064 |
+
"learning_rate": 2.8682375954035756e-05,
|
1065 |
+
"loss": 0.2269,
|
1066 |
+
"step": 1360
|
1067 |
+
},
|
1068 |
+
{
|
1069 |
+
"epoch": 3.8375350140056024,
|
1070 |
+
"grad_norm": 1.473433494567871,
|
1071 |
+
"learning_rate": 2.7421973431411775e-05,
|
1072 |
+
"loss": 0.2121,
|
1073 |
+
"step": 1370
|
1074 |
+
},
|
1075 |
+
{
|
1076 |
+
"epoch": 3.865546218487395,
|
1077 |
+
"grad_norm": 1.1347614526748657,
|
1078 |
+
"learning_rate": 2.618547646451317e-05,
|
1079 |
+
"loss": 0.2187,
|
1080 |
+
"step": 1380
|
1081 |
+
},
|
1082 |
+
{
|
1083 |
+
"epoch": 3.8935574229691876,
|
1084 |
+
"grad_norm": 1.4064631462097168,
|
1085 |
+
"learning_rate": 2.497329232739438e-05,
|
1086 |
+
"loss": 0.2243,
|
1087 |
+
"step": 1390
|
1088 |
+
},
|
1089 |
+
{
|
1090 |
+
"epoch": 3.9215686274509802,
|
1091 |
+
"grad_norm": 1.394667148590088,
|
1092 |
+
"learning_rate": 2.378582028601529e-05,
|
1093 |
+
"loss": 0.222,
|
1094 |
+
"step": 1400
|
1095 |
+
},
|
1096 |
+
{
|
1097 |
+
"epoch": 3.9215686274509802,
|
1098 |
+
"eval_loss": 1.474218487739563,
|
1099 |
+
"eval_runtime": 26.0178,
|
1100 |
+
"eval_samples_per_second": 13.721,
|
1101 |
+
"eval_steps_per_second": 1.73,
|
1102 |
+
"step": 1400
|
1103 |
+
}
|
1104 |
+
],
|
1105 |
+
"logging_steps": 10,
|
1106 |
+
"max_steps": 1785,
|
1107 |
+
"num_input_tokens_seen": 0,
|
1108 |
+
"num_train_epochs": 5,
|
1109 |
+
"save_steps": 100,
|
1110 |
+
"stateful_callbacks": {
|
1111 |
+
"EarlyStoppingCallback": {
|
1112 |
+
"args": {
|
1113 |
+
"early_stopping_patience": 12,
|
1114 |
+
"early_stopping_threshold": 0.0
|
1115 |
+
},
|
1116 |
+
"attributes": {
|
1117 |
+
"early_stopping_patience_counter": 11
|
1118 |
+
}
|
1119 |
+
},
|
1120 |
+
"TrainerControl": {
|
1121 |
+
"args": {
|
1122 |
+
"should_epoch_stop": false,
|
1123 |
+
"should_evaluate": false,
|
1124 |
+
"should_log": false,
|
1125 |
+
"should_save": true,
|
1126 |
+
"should_training_stop": false
|
1127 |
+
},
|
1128 |
+
"attributes": {}
|
1129 |
+
}
|
1130 |
+
},
|
1131 |
+
"total_flos": 9.429376435853722e+16,
|
1132 |
+
"train_batch_size": 8,
|
1133 |
+
"trial_name": null,
|
1134 |
+
"trial_params": null
|
1135 |
+
}
|
checkpoint-1400/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
|
3 |
+
size 5368
|
checkpoint-1500/README.md
ADDED
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mistralai/Mistral-7B-Instruct-v0.3
|
3 |
+
library_name: peft
|
4 |
+
---
|
5 |
+
|
6 |
+
# Model Card for Model ID
|
7 |
+
|
8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
|
14 |
+
### Model Description
|
15 |
+
|
16 |
+
<!-- Provide a longer summary of what this model is. -->
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
- **Developed by:** [More Information Needed]
|
21 |
+
- **Funded by [optional]:** [More Information Needed]
|
22 |
+
- **Shared by [optional]:** [More Information Needed]
|
23 |
+
- **Model type:** [More Information Needed]
|
24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
25 |
+
- **License:** [More Information Needed]
|
26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
+
|
28 |
+
### Model Sources [optional]
|
29 |
+
|
30 |
+
<!-- Provide the basic links for the model. -->
|
31 |
+
|
32 |
+
- **Repository:** [More Information Needed]
|
33 |
+
- **Paper [optional]:** [More Information Needed]
|
34 |
+
- **Demo [optional]:** [More Information Needed]
|
35 |
+
|
36 |
+
## Uses
|
37 |
+
|
38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
+
|
40 |
+
### Direct Use
|
41 |
+
|
42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
+
|
44 |
+
[More Information Needed]
|
45 |
+
|
46 |
+
### Downstream Use [optional]
|
47 |
+
|
48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
+
|
50 |
+
[More Information Needed]
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
+
|
56 |
+
[More Information Needed]
|
57 |
+
|
58 |
+
## Bias, Risks, and Limitations
|
59 |
+
|
60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
+
|
62 |
+
[More Information Needed]
|
63 |
+
|
64 |
+
### Recommendations
|
65 |
+
|
66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
+
|
68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
+
|
70 |
+
## How to Get Started with the Model
|
71 |
+
|
72 |
+
Use the code below to get started with the model.
|
73 |
+
|
74 |
+
[More Information Needed]
|
75 |
+
|
76 |
+
## Training Details
|
77 |
+
|
78 |
+
### Training Data
|
79 |
+
|
80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
+
|
82 |
+
[More Information Needed]
|
83 |
+
|
84 |
+
### Training Procedure
|
85 |
+
|
86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
+
|
88 |
+
#### Preprocessing [optional]
|
89 |
+
|
90 |
+
[More Information Needed]
|
91 |
+
|
92 |
+
|
93 |
+
#### Training Hyperparameters
|
94 |
+
|
95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
+
|
97 |
+
#### Speeds, Sizes, Times [optional]
|
98 |
+
|
99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
+
|
101 |
+
[More Information Needed]
|
102 |
+
|
103 |
+
## Evaluation
|
104 |
+
|
105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
+
|
107 |
+
### Testing Data, Factors & Metrics
|
108 |
+
|
109 |
+
#### Testing Data
|
110 |
+
|
111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
112 |
+
|
113 |
+
[More Information Needed]
|
114 |
+
|
115 |
+
#### Factors
|
116 |
+
|
117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
+
|
119 |
+
[More Information Needed]
|
120 |
+
|
121 |
+
#### Metrics
|
122 |
+
|
123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
+
|
125 |
+
[More Information Needed]
|
126 |
+
|
127 |
+
### Results
|
128 |
+
|
129 |
+
[More Information Needed]
|
130 |
+
|
131 |
+
#### Summary
|
132 |
+
|
133 |
+
|
134 |
+
|
135 |
+
## Model Examination [optional]
|
136 |
+
|
137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
138 |
+
|
139 |
+
[More Information Needed]
|
140 |
+
|
141 |
+
## Environmental Impact
|
142 |
+
|
143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
+
|
145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
+
|
147 |
+
- **Hardware Type:** [More Information Needed]
|
148 |
+
- **Hours used:** [More Information Needed]
|
149 |
+
- **Cloud Provider:** [More Information Needed]
|
150 |
+
- **Compute Region:** [More Information Needed]
|
151 |
+
- **Carbon Emitted:** [More Information Needed]
|
152 |
+
|
153 |
+
## Technical Specifications [optional]
|
154 |
+
|
155 |
+
### Model Architecture and Objective
|
156 |
+
|
157 |
+
[More Information Needed]
|
158 |
+
|
159 |
+
### Compute Infrastructure
|
160 |
+
|
161 |
+
[More Information Needed]
|
162 |
+
|
163 |
+
#### Hardware
|
164 |
+
|
165 |
+
[More Information Needed]
|
166 |
+
|
167 |
+
#### Software
|
168 |
+
|
169 |
+
[More Information Needed]
|
170 |
+
|
171 |
+
## Citation [optional]
|
172 |
+
|
173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
+
|
175 |
+
**BibTeX:**
|
176 |
+
|
177 |
+
[More Information Needed]
|
178 |
+
|
179 |
+
**APA:**
|
180 |
+
|
181 |
+
[More Information Needed]
|
182 |
+
|
183 |
+
## Glossary [optional]
|
184 |
+
|
185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
+
|
187 |
+
[More Information Needed]
|
188 |
+
|
189 |
+
## More Information [optional]
|
190 |
+
|
191 |
+
[More Information Needed]
|
192 |
+
|
193 |
+
## Model Card Authors [optional]
|
194 |
+
|
195 |
+
[More Information Needed]
|
196 |
+
|
197 |
+
## Model Card Contact
|
198 |
+
|
199 |
+
[More Information Needed]
|
200 |
+
### Framework versions
|
201 |
+
|
202 |
+
- PEFT 0.14.0
|
checkpoint-1500/adapter_config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"alpha_pattern": {},
|
3 |
+
"auto_mapping": null,
|
4 |
+
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
|
5 |
+
"bias": "none",
|
6 |
+
"eva_config": null,
|
7 |
+
"exclude_modules": null,
|
8 |
+
"fan_in_fan_out": false,
|
9 |
+
"inference_mode": true,
|
10 |
+
"init_lora_weights": true,
|
11 |
+
"layer_replication": null,
|
12 |
+
"layers_pattern": null,
|
13 |
+
"layers_to_transform": null,
|
14 |
+
"loftq_config": {},
|
15 |
+
"lora_alpha": 32,
|
16 |
+
"lora_bias": false,
|
17 |
+
"lora_dropout": 0.05,
|
18 |
+
"megatron_config": null,
|
19 |
+
"megatron_core": "megatron.core",
|
20 |
+
"modules_to_save": null,
|
21 |
+
"peft_type": "LORA",
|
22 |
+
"r": 16,
|
23 |
+
"rank_pattern": {},
|
24 |
+
"revision": null,
|
25 |
+
"target_modules": [
|
26 |
+
"o_proj",
|
27 |
+
"up_proj",
|
28 |
+
"gate_proj",
|
29 |
+
"q_proj",
|
30 |
+
"k_proj",
|
31 |
+
"v_proj",
|
32 |
+
"down_proj"
|
33 |
+
],
|
34 |
+
"task_type": "CAUSAL_LM",
|
35 |
+
"use_dora": false,
|
36 |
+
"use_rslora": false
|
37 |
+
}
|
checkpoint-1500/adapter_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e727b7c8b56abbc5942fdfde2dbc5e375b0db508fc8ac7b2cad727cb7b17880f
|
3 |
+
size 167832240
|
checkpoint-1500/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a9e8d62a362ad67f9bc9793c5e8cf130f5a0e8bce9cead1a377b21c539246b14
|
3 |
+
size 335922386
|
checkpoint-1500/rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:161cf2545ae205d8a7f1a7656c017af30e1e56e5150f9c795f10a302fbc0943e
|
3 |
+
size 14244
|
checkpoint-1500/scaler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:495cba9ae11960da6b4497a76242a85485dd5b067748b8f92d20b7738cee245a
|
3 |
+
size 988
|
checkpoint-1500/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9d4e03022bce0c02012e522e66435423a73932b4197057e2544fe60d66fe1eff
|
3 |
+
size 1064
|
checkpoint-1500/special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "</s>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<unk>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": false,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
checkpoint-1500/tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-1500/tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
|
3 |
+
size 587404
|
checkpoint-1500/tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-1500/trainer_state.json
ADDED
@@ -0,0 +1,1213 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_global_step": 300,
|
3 |
+
"best_metric": 1.0371309518814087,
|
4 |
+
"best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
|
5 |
+
"epoch": 4.201680672268908,
|
6 |
+
"eval_steps": 100,
|
7 |
+
"global_step": 1500,
|
8 |
+
"is_hyper_param_search": false,
|
9 |
+
"is_local_process_zero": true,
|
10 |
+
"is_world_process_zero": true,
|
11 |
+
"log_history": [
|
12 |
+
{
|
13 |
+
"epoch": 0.028011204481792718,
|
14 |
+
"grad_norm": 5.871906757354736,
|
15 |
+
"learning_rate": 2.962962962962963e-05,
|
16 |
+
"loss": 2.3552,
|
17 |
+
"step": 10
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"epoch": 0.056022408963585436,
|
21 |
+
"grad_norm": 2.389425039291382,
|
22 |
+
"learning_rate": 6.296296296296296e-05,
|
23 |
+
"loss": 1.7094,
|
24 |
+
"step": 20
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"epoch": 0.08403361344537816,
|
28 |
+
"grad_norm": 2.160494804382324,
|
29 |
+
"learning_rate": 0.0001,
|
30 |
+
"loss": 1.4552,
|
31 |
+
"step": 30
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"epoch": 0.11204481792717087,
|
35 |
+
"grad_norm": 2.3963708877563477,
|
36 |
+
"learning_rate": 0.00013703703703703705,
|
37 |
+
"loss": 1.2928,
|
38 |
+
"step": 40
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"epoch": 0.1400560224089636,
|
42 |
+
"grad_norm": 2.192927598953247,
|
43 |
+
"learning_rate": 0.00017407407407407408,
|
44 |
+
"loss": 1.1542,
|
45 |
+
"step": 50
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"epoch": 0.16806722689075632,
|
49 |
+
"grad_norm": 1.6872493028640747,
|
50 |
+
"learning_rate": 0.00019999851776504612,
|
51 |
+
"loss": 1.1238,
|
52 |
+
"step": 60
|
53 |
+
},
|
54 |
+
{
|
55 |
+
"epoch": 0.19607843137254902,
|
56 |
+
"grad_norm": 1.8370214700698853,
|
57 |
+
"learning_rate": 0.00019997216814377874,
|
58 |
+
"loss": 1.1234,
|
59 |
+
"step": 70
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"epoch": 0.22408963585434175,
|
63 |
+
"grad_norm": 1.7727515697479248,
|
64 |
+
"learning_rate": 0.0001999128899579094,
|
65 |
+
"loss": 1.1143,
|
66 |
+
"step": 80
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"epoch": 0.25210084033613445,
|
70 |
+
"grad_norm": 1.5649667978286743,
|
71 |
+
"learning_rate": 0.00019982070273232797,
|
72 |
+
"loss": 1.112,
|
73 |
+
"step": 90
|
74 |
+
},
|
75 |
+
{
|
76 |
+
"epoch": 0.2801120448179272,
|
77 |
+
"grad_norm": 1.5107511281967163,
|
78 |
+
"learning_rate": 0.0001996956368314155,
|
79 |
+
"loss": 1.1154,
|
80 |
+
"step": 100
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"epoch": 0.2801120448179272,
|
84 |
+
"eval_loss": 1.0852699279785156,
|
85 |
+
"eval_runtime": 25.3067,
|
86 |
+
"eval_samples_per_second": 14.107,
|
87 |
+
"eval_steps_per_second": 1.778,
|
88 |
+
"step": 100
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"epoch": 0.3081232492997199,
|
92 |
+
"grad_norm": 1.5419102907180786,
|
93 |
+
"learning_rate": 0.00019953773344904302,
|
94 |
+
"loss": 1.0686,
|
95 |
+
"step": 110
|
96 |
+
},
|
97 |
+
{
|
98 |
+
"epoch": 0.33613445378151263,
|
99 |
+
"grad_norm": 1.5007447004318237,
|
100 |
+
"learning_rate": 0.00019934704459500312,
|
101 |
+
"loss": 1.0849,
|
102 |
+
"step": 120
|
103 |
+
},
|
104 |
+
{
|
105 |
+
"epoch": 0.3641456582633053,
|
106 |
+
"grad_norm": 1.67251455783844,
|
107 |
+
"learning_rate": 0.000199123633077879,
|
108 |
+
"loss": 1.0793,
|
109 |
+
"step": 130
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"epoch": 0.39215686274509803,
|
113 |
+
"grad_norm": 1.4843816757202148,
|
114 |
+
"learning_rate": 0.00019886757248435704,
|
115 |
+
"loss": 1.0122,
|
116 |
+
"step": 140
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"epoch": 0.42016806722689076,
|
120 |
+
"grad_norm": 1.4718296527862549,
|
121 |
+
"learning_rate": 0.00019857894715498873,
|
122 |
+
"loss": 1.0423,
|
123 |
+
"step": 150
|
124 |
+
},
|
125 |
+
{
|
126 |
+
"epoch": 0.4481792717086835,
|
127 |
+
"grad_norm": 1.4704537391662598,
|
128 |
+
"learning_rate": 0.00019825785215641086,
|
129 |
+
"loss": 1.0524,
|
130 |
+
"step": 160
|
131 |
+
},
|
132 |
+
{
|
133 |
+
"epoch": 0.47619047619047616,
|
134 |
+
"grad_norm": 1.4213836193084717,
|
135 |
+
"learning_rate": 0.0001979043932500329,
|
136 |
+
"loss": 1.0373,
|
137 |
+
"step": 170
|
138 |
+
},
|
139 |
+
{
|
140 |
+
"epoch": 0.5042016806722689,
|
141 |
+
"grad_norm": 1.579439401626587,
|
142 |
+
"learning_rate": 0.00019751868685720134,
|
143 |
+
"loss": 1.0539,
|
144 |
+
"step": 180
|
145 |
+
},
|
146 |
+
{
|
147 |
+
"epoch": 0.5322128851540616,
|
148 |
+
"grad_norm": 1.4878956079483032,
|
149 |
+
"learning_rate": 0.00019710086002085335,
|
150 |
+
"loss": 1.0315,
|
151 |
+
"step": 190
|
152 |
+
},
|
153 |
+
{
|
154 |
+
"epoch": 0.5602240896358543,
|
155 |
+
"grad_norm": 1.5998858213424683,
|
156 |
+
"learning_rate": 0.0001966510503636716,
|
157 |
+
"loss": 1.038,
|
158 |
+
"step": 200
|
159 |
+
},
|
160 |
+
{
|
161 |
+
"epoch": 0.5602240896358543,
|
162 |
+
"eval_loss": 1.0578807592391968,
|
163 |
+
"eval_runtime": 25.2375,
|
164 |
+
"eval_samples_per_second": 14.146,
|
165 |
+
"eval_steps_per_second": 1.783,
|
166 |
+
"step": 200
|
167 |
+
},
|
168 |
+
{
|
169 |
+
"epoch": 0.5882352941176471,
|
170 |
+
"grad_norm": 1.5214334726333618,
|
171 |
+
"learning_rate": 0.00019616940604275454,
|
172 |
+
"loss": 1.0515,
|
173 |
+
"step": 210
|
174 |
+
},
|
175 |
+
{
|
176 |
+
"epoch": 0.6162464985994398,
|
177 |
+
"grad_norm": 1.7388910055160522,
|
178 |
+
"learning_rate": 0.00019565608570081672,
|
179 |
+
"loss": 1.07,
|
180 |
+
"step": 220
|
181 |
+
},
|
182 |
+
{
|
183 |
+
"epoch": 0.6442577030812325,
|
184 |
+
"grad_norm": 1.5043017864227295,
|
185 |
+
"learning_rate": 0.00019511125841393558,
|
186 |
+
"loss": 1.0545,
|
187 |
+
"step": 230
|
188 |
+
},
|
189 |
+
{
|
190 |
+
"epoch": 0.6722689075630253,
|
191 |
+
"grad_norm": 1.5737371444702148,
|
192 |
+
"learning_rate": 0.00019453510363586155,
|
193 |
+
"loss": 1.0546,
|
194 |
+
"step": 240
|
195 |
+
},
|
196 |
+
{
|
197 |
+
"epoch": 0.7002801120448179,
|
198 |
+
"grad_norm": 1.5473017692565918,
|
199 |
+
"learning_rate": 0.00019392781113891007,
|
200 |
+
"loss": 1.0306,
|
201 |
+
"step": 250
|
202 |
+
},
|
203 |
+
{
|
204 |
+
"epoch": 0.7282913165266106,
|
205 |
+
"grad_norm": 1.6050424575805664,
|
206 |
+
"learning_rate": 0.00019328958095145496,
|
207 |
+
"loss": 1.0285,
|
208 |
+
"step": 260
|
209 |
+
},
|
210 |
+
{
|
211 |
+
"epoch": 0.7563025210084033,
|
212 |
+
"grad_norm": 1.4448118209838867,
|
213 |
+
"learning_rate": 0.00019262062329204356,
|
214 |
+
"loss": 1.0444,
|
215 |
+
"step": 270
|
216 |
+
},
|
217 |
+
{
|
218 |
+
"epoch": 0.7843137254901961,
|
219 |
+
"grad_norm": 1.7479037046432495,
|
220 |
+
"learning_rate": 0.0001919211585001554,
|
221 |
+
"loss": 1.0341,
|
222 |
+
"step": 280
|
223 |
+
},
|
224 |
+
{
|
225 |
+
"epoch": 0.8123249299719888,
|
226 |
+
"grad_norm": 1.4101306200027466,
|
227 |
+
"learning_rate": 0.00019119141696362759,
|
228 |
+
"loss": 1.0244,
|
229 |
+
"step": 290
|
230 |
+
},
|
231 |
+
{
|
232 |
+
"epoch": 0.8403361344537815,
|
233 |
+
"grad_norm": 1.4832830429077148,
|
234 |
+
"learning_rate": 0.00019043163904277016,
|
235 |
+
"loss": 1.0407,
|
236 |
+
"step": 300
|
237 |
+
},
|
238 |
+
{
|
239 |
+
"epoch": 0.8403361344537815,
|
240 |
+
"eval_loss": 1.0371309518814087,
|
241 |
+
"eval_runtime": 25.2977,
|
242 |
+
"eval_samples_per_second": 14.112,
|
243 |
+
"eval_steps_per_second": 1.779,
|
244 |
+
"step": 300
|
245 |
+
},
|
246 |
+
{
|
247 |
+
"epoch": 0.8683473389355743,
|
248 |
+
"grad_norm": 1.4755833148956299,
|
249 |
+
"learning_rate": 0.00018964207499119686,
|
250 |
+
"loss": 1.0462,
|
251 |
+
"step": 310
|
252 |
+
},
|
253 |
+
{
|
254 |
+
"epoch": 0.896358543417367,
|
255 |
+
"grad_norm": 1.4913283586502075,
|
256 |
+
"learning_rate": 0.0001888229848733972,
|
257 |
+
"loss": 1.0474,
|
258 |
+
"step": 320
|
259 |
+
},
|
260 |
+
{
|
261 |
+
"epoch": 0.9243697478991597,
|
262 |
+
"grad_norm": 1.7317923307418823,
|
263 |
+
"learning_rate": 0.0001879746384790773,
|
264 |
+
"loss": 1.0447,
|
265 |
+
"step": 330
|
266 |
+
},
|
267 |
+
{
|
268 |
+
"epoch": 0.9523809523809523,
|
269 |
+
"grad_norm": 1.5321850776672363,
|
270 |
+
"learning_rate": 0.0001870973152342971,
|
271 |
+
"loss": 1.0225,
|
272 |
+
"step": 340
|
273 |
+
},
|
274 |
+
{
|
275 |
+
"epoch": 0.9803921568627451,
|
276 |
+
"grad_norm": 1.5264012813568115,
|
277 |
+
"learning_rate": 0.00018619130410943386,
|
278 |
+
"loss": 1.0527,
|
279 |
+
"step": 350
|
280 |
+
},
|
281 |
+
{
|
282 |
+
"epoch": 1.0084033613445378,
|
283 |
+
"grad_norm": 1.5999464988708496,
|
284 |
+
"learning_rate": 0.0001852569035240018,
|
285 |
+
"loss": 0.957,
|
286 |
+
"step": 360
|
287 |
+
},
|
288 |
+
{
|
289 |
+
"epoch": 1.0364145658263306,
|
290 |
+
"grad_norm": 1.4261401891708374,
|
291 |
+
"learning_rate": 0.00018429442124835954,
|
292 |
+
"loss": 0.6759,
|
293 |
+
"step": 370
|
294 |
+
},
|
295 |
+
{
|
296 |
+
"epoch": 1.0644257703081232,
|
297 |
+
"grad_norm": 1.6350306272506714,
|
298 |
+
"learning_rate": 0.00018330417430233756,
|
299 |
+
"loss": 0.6721,
|
300 |
+
"step": 380
|
301 |
+
},
|
302 |
+
{
|
303 |
+
"epoch": 1.092436974789916,
|
304 |
+
"grad_norm": 1.4598591327667236,
|
305 |
+
"learning_rate": 0.00018228648885081886,
|
306 |
+
"loss": 0.6695,
|
307 |
+
"step": 390
|
308 |
+
},
|
309 |
+
{
|
310 |
+
"epoch": 1.1204481792717087,
|
311 |
+
"grad_norm": 1.6912418603897095,
|
312 |
+
"learning_rate": 0.00018124170009630787,
|
313 |
+
"loss": 0.6482,
|
314 |
+
"step": 400
|
315 |
+
},
|
316 |
+
{
|
317 |
+
"epoch": 1.1204481792717087,
|
318 |
+
"eval_loss": 1.0602205991744995,
|
319 |
+
"eval_runtime": 25.4667,
|
320 |
+
"eval_samples_per_second": 14.018,
|
321 |
+
"eval_steps_per_second": 1.767,
|
322 |
+
"step": 400
|
323 |
+
},
|
324 |
+
{
|
325 |
+
"epoch": 1.1484593837535013,
|
326 |
+
"grad_norm": 1.6862925291061401,
|
327 |
+
"learning_rate": 0.0001801701521685223,
|
328 |
+
"loss": 0.6921,
|
329 |
+
"step": 410
|
330 |
+
},
|
331 |
+
{
|
332 |
+
"epoch": 1.1764705882352942,
|
333 |
+
"grad_norm": 1.8303461074829102,
|
334 |
+
"learning_rate": 0.00017907219801104458,
|
335 |
+
"loss": 0.6826,
|
336 |
+
"step": 420
|
337 |
+
},
|
338 |
+
{
|
339 |
+
"epoch": 1.2044817927170868,
|
340 |
+
"grad_norm": 2.0269150733947754,
|
341 |
+
"learning_rate": 0.00017794819926507014,
|
342 |
+
"loss": 0.6765,
|
343 |
+
"step": 430
|
344 |
+
},
|
345 |
+
{
|
346 |
+
"epoch": 1.2324929971988796,
|
347 |
+
"grad_norm": 1.7937819957733154,
|
348 |
+
"learning_rate": 0.00017679852615029123,
|
349 |
+
"loss": 0.6884,
|
350 |
+
"step": 440
|
351 |
+
},
|
352 |
+
{
|
353 |
+
"epoch": 1.2605042016806722,
|
354 |
+
"grad_norm": 1.731198787689209,
|
355 |
+
"learning_rate": 0.00017562355734295477,
|
356 |
+
"loss": 0.6783,
|
357 |
+
"step": 450
|
358 |
+
},
|
359 |
+
{
|
360 |
+
"epoch": 1.2885154061624648,
|
361 |
+
"grad_norm": 1.8679757118225098,
|
362 |
+
"learning_rate": 0.00017442367985113504,
|
363 |
+
"loss": 0.6689,
|
364 |
+
"step": 460
|
365 |
+
},
|
366 |
+
{
|
367 |
+
"epoch": 1.3165266106442577,
|
368 |
+
"grad_norm": 1.6848735809326172,
|
369 |
+
"learning_rate": 0.0001731992888872621,
|
370 |
+
"loss": 0.6937,
|
371 |
+
"step": 470
|
372 |
+
},
|
373 |
+
{
|
374 |
+
"epoch": 1.3445378151260505,
|
375 |
+
"grad_norm": 1.5047993659973145,
|
376 |
+
"learning_rate": 0.00017195078773794755,
|
377 |
+
"loss": 0.6919,
|
378 |
+
"step": 480
|
379 |
+
},
|
380 |
+
{
|
381 |
+
"epoch": 1.3725490196078431,
|
382 |
+
"grad_norm": 1.7145918607711792,
|
383 |
+
"learning_rate": 0.00017067858763115132,
|
384 |
+
"loss": 0.6694,
|
385 |
+
"step": 490
|
386 |
+
},
|
387 |
+
{
|
388 |
+
"epoch": 1.4005602240896358,
|
389 |
+
"grad_norm": 1.6695507764816284,
|
390 |
+
"learning_rate": 0.00016938310760073236,
|
391 |
+
"loss": 0.6665,
|
392 |
+
"step": 500
|
393 |
+
},
|
394 |
+
{
|
395 |
+
"epoch": 1.4005602240896358,
|
396 |
+
"eval_loss": 1.0960317850112915,
|
397 |
+
"eval_runtime": 25.678,
|
398 |
+
"eval_samples_per_second": 13.903,
|
399 |
+
"eval_steps_per_second": 1.752,
|
400 |
+
"step": 500
|
401 |
+
},
|
402 |
+
{
|
403 |
+
"epoch": 1.4285714285714286,
|
404 |
+
"grad_norm": 1.9052048921585083,
|
405 |
+
"learning_rate": 0.00016806477434842836,
|
406 |
+
"loss": 0.6711,
|
407 |
+
"step": 510
|
408 |
+
},
|
409 |
+
{
|
410 |
+
"epoch": 1.4565826330532212,
|
411 |
+
"grad_norm": 1.7252157926559448,
|
412 |
+
"learning_rate": 0.00016672402210331,
|
413 |
+
"loss": 0.7007,
|
414 |
+
"step": 520
|
415 |
+
},
|
416 |
+
{
|
417 |
+
"epoch": 1.484593837535014,
|
418 |
+
"grad_norm": 1.7970322370529175,
|
419 |
+
"learning_rate": 0.00016536129247875567,
|
420 |
+
"loss": 0.7164,
|
421 |
+
"step": 530
|
422 |
+
},
|
423 |
+
{
|
424 |
+
"epoch": 1.5126050420168067,
|
425 |
+
"grad_norm": 1.7098366022109985,
|
426 |
+
"learning_rate": 0.00016397703432699404,
|
427 |
+
"loss": 0.7117,
|
428 |
+
"step": 540
|
429 |
+
},
|
430 |
+
{
|
431 |
+
"epoch": 1.5406162464985993,
|
432 |
+
"grad_norm": 1.853852391242981,
|
433 |
+
"learning_rate": 0.00016257170359126237,
|
434 |
+
"loss": 0.7123,
|
435 |
+
"step": 550
|
436 |
+
},
|
437 |
+
{
|
438 |
+
"epoch": 1.5686274509803921,
|
439 |
+
"grad_norm": 2.119030714035034,
|
440 |
+
"learning_rate": 0.000161145763155629,
|
441 |
+
"loss": 0.6952,
|
442 |
+
"step": 560
|
443 |
+
},
|
444 |
+
{
|
445 |
+
"epoch": 1.596638655462185,
|
446 |
+
"grad_norm": 1.7882400751113892,
|
447 |
+
"learning_rate": 0.00015969968269253013,
|
448 |
+
"loss": 0.6957,
|
449 |
+
"step": 570
|
450 |
+
},
|
451 |
+
{
|
452 |
+
"epoch": 1.6246498599439776,
|
453 |
+
"grad_norm": 2.0184831619262695,
|
454 |
+
"learning_rate": 0.00015823393850806998,
|
455 |
+
"loss": 0.6972,
|
456 |
+
"step": 580
|
457 |
+
},
|
458 |
+
{
|
459 |
+
"epoch": 1.6526610644257702,
|
460 |
+
"grad_norm": 1.5946452617645264,
|
461 |
+
"learning_rate": 0.00015674901338513672,
|
462 |
+
"loss": 0.6888,
|
463 |
+
"step": 590
|
464 |
+
},
|
465 |
+
{
|
466 |
+
"epoch": 1.680672268907563,
|
467 |
+
"grad_norm": 1.7266991138458252,
|
468 |
+
"learning_rate": 0.0001552453964243845,
|
469 |
+
"loss": 0.7029,
|
470 |
+
"step": 600
|
471 |
+
},
|
472 |
+
{
|
473 |
+
"epoch": 1.680672268907563,
|
474 |
+
"eval_loss": 1.0715969800949097,
|
475 |
+
"eval_runtime": 25.7893,
|
476 |
+
"eval_samples_per_second": 13.843,
|
477 |
+
"eval_steps_per_second": 1.745,
|
478 |
+
"step": 600
|
479 |
+
},
|
480 |
+
{
|
481 |
+
"epoch": 1.708683473389356,
|
482 |
+
"grad_norm": 1.7430881261825562,
|
483 |
+
"learning_rate": 0.000153723582883135,
|
484 |
+
"loss": 0.7032,
|
485 |
+
"step": 610
|
486 |
+
},
|
487 |
+
{
|
488 |
+
"epoch": 1.7366946778711485,
|
489 |
+
"grad_norm": 1.88009774684906,
|
490 |
+
"learning_rate": 0.00015218407401225057,
|
491 |
+
"loss": 0.6902,
|
492 |
+
"step": 620
|
493 |
+
},
|
494 |
+
{
|
495 |
+
"epoch": 1.7647058823529411,
|
496 |
+
"grad_norm": 1.7102521657943726,
|
497 |
+
"learning_rate": 0.000150627376891034,
|
498 |
+
"loss": 0.6866,
|
499 |
+
"step": 630
|
500 |
+
},
|
501 |
+
{
|
502 |
+
"epoch": 1.7927170868347337,
|
503 |
+
"grad_norm": 1.854864239692688,
|
504 |
+
"learning_rate": 0.0001490540042602077,
|
505 |
+
"loss": 0.6944,
|
506 |
+
"step": 640
|
507 |
+
},
|
508 |
+
{
|
509 |
+
"epoch": 1.8207282913165266,
|
510 |
+
"grad_norm": 1.61717689037323,
|
511 |
+
"learning_rate": 0.0001474644743530288,
|
512 |
+
"loss": 0.7043,
|
513 |
+
"step": 650
|
514 |
+
},
|
515 |
+
{
|
516 |
+
"epoch": 1.8487394957983194,
|
517 |
+
"grad_norm": 1.7749794721603394,
|
518 |
+
"learning_rate": 0.00014585931072459474,
|
519 |
+
"loss": 0.7008,
|
520 |
+
"step": 660
|
521 |
+
},
|
522 |
+
{
|
523 |
+
"epoch": 1.876750700280112,
|
524 |
+
"grad_norm": 1.7132004499435425,
|
525 |
+
"learning_rate": 0.00014423904207939615,
|
526 |
+
"loss": 0.7167,
|
527 |
+
"step": 670
|
528 |
+
},
|
529 |
+
{
|
530 |
+
"epoch": 1.9047619047619047,
|
531 |
+
"grad_norm": 1.784264326095581,
|
532 |
+
"learning_rate": 0.0001426042020971736,
|
533 |
+
"loss": 0.7203,
|
534 |
+
"step": 680
|
535 |
+
},
|
536 |
+
{
|
537 |
+
"epoch": 1.9327731092436975,
|
538 |
+
"grad_norm": 1.8548870086669922,
|
539 |
+
"learning_rate": 0.00014095532925713553,
|
540 |
+
"loss": 0.7198,
|
541 |
+
"step": 690
|
542 |
+
},
|
543 |
+
{
|
544 |
+
"epoch": 1.9607843137254903,
|
545 |
+
"grad_norm": 1.4964855909347534,
|
546 |
+
"learning_rate": 0.00013929296666059544,
|
547 |
+
"loss": 0.7053,
|
548 |
+
"step": 700
|
549 |
+
},
|
550 |
+
{
|
551 |
+
"epoch": 1.9607843137254903,
|
552 |
+
"eval_loss": 1.0620192289352417,
|
553 |
+
"eval_runtime": 25.7481,
|
554 |
+
"eval_samples_per_second": 13.865,
|
555 |
+
"eval_steps_per_second": 1.748,
|
556 |
+
"step": 700
|
557 |
+
},
|
558 |
+
{
|
559 |
+
"epoch": 1.988795518207283,
|
560 |
+
"grad_norm": 1.7073538303375244,
|
561 |
+
"learning_rate": 0.00013761766185208663,
|
562 |
+
"loss": 0.7338,
|
563 |
+
"step": 710
|
564 |
+
},
|
565 |
+
{
|
566 |
+
"epoch": 2.0168067226890756,
|
567 |
+
"grad_norm": 1.4009623527526855,
|
568 |
+
"learning_rate": 0.00013592996663901354,
|
569 |
+
"loss": 0.4904,
|
570 |
+
"step": 720
|
571 |
+
},
|
572 |
+
{
|
573 |
+
"epoch": 2.044817927170868,
|
574 |
+
"grad_norm": 1.6378936767578125,
|
575 |
+
"learning_rate": 0.00013423043690989874,
|
576 |
+
"loss": 0.3822,
|
577 |
+
"step": 730
|
578 |
+
},
|
579 |
+
{
|
580 |
+
"epoch": 2.0728291316526612,
|
581 |
+
"grad_norm": 1.8590551614761353,
|
582 |
+
"learning_rate": 0.00013251963245128603,
|
583 |
+
"loss": 0.3615,
|
584 |
+
"step": 740
|
585 |
+
},
|
586 |
+
{
|
587 |
+
"epoch": 2.100840336134454,
|
588 |
+
"grad_norm": 1.7983025312423706,
|
589 |
+
"learning_rate": 0.00013079811676335935,
|
590 |
+
"loss": 0.356,
|
591 |
+
"step": 750
|
592 |
+
},
|
593 |
+
{
|
594 |
+
"epoch": 2.1288515406162465,
|
595 |
+
"grad_norm": 2.0292270183563232,
|
596 |
+
"learning_rate": 0.00012906645687433857,
|
597 |
+
"loss": 0.3672,
|
598 |
+
"step": 760
|
599 |
+
},
|
600 |
+
{
|
601 |
+
"epoch": 2.156862745098039,
|
602 |
+
"grad_norm": 1.774778127670288,
|
603 |
+
"learning_rate": 0.00012732522315371338,
|
604 |
+
"loss": 0.3758,
|
605 |
+
"step": 770
|
606 |
+
},
|
607 |
+
{
|
608 |
+
"epoch": 2.184873949579832,
|
609 |
+
"grad_norm": 1.9023045301437378,
|
610 |
+
"learning_rate": 0.0001255749891243764,
|
611 |
+
"loss": 0.3674,
|
612 |
+
"step": 780
|
613 |
+
},
|
614 |
+
{
|
615 |
+
"epoch": 2.212885154061625,
|
616 |
+
"grad_norm": 1.4395744800567627,
|
617 |
+
"learning_rate": 0.00012381633127371783,
|
618 |
+
"loss": 0.3723,
|
619 |
+
"step": 790
|
620 |
+
},
|
621 |
+
{
|
622 |
+
"epoch": 2.2408963585434174,
|
623 |
+
"grad_norm": 1.6131172180175781,
|
624 |
+
"learning_rate": 0.00012204982886374348,
|
625 |
+
"loss": 0.3699,
|
626 |
+
"step": 800
|
627 |
+
},
|
628 |
+
{
|
629 |
+
"epoch": 2.2408963585434174,
|
630 |
+
"eval_loss": 1.2616853713989258,
|
631 |
+
"eval_runtime": 26.08,
|
632 |
+
"eval_samples_per_second": 13.689,
|
633 |
+
"eval_steps_per_second": 1.725,
|
634 |
+
"step": 800
|
635 |
+
},
|
636 |
+
{
|
637 |
+
"epoch": 2.26890756302521,
|
638 |
+
"grad_norm": 1.782162070274353,
|
639 |
+
"learning_rate": 0.00012027606374027915,
|
640 |
+
"loss": 0.3834,
|
641 |
+
"step": 810
|
642 |
+
},
|
643 |
+
{
|
644 |
+
"epoch": 2.2969187675070026,
|
645 |
+
"grad_norm": 1.7493244409561157,
|
646 |
+
"learning_rate": 0.00011849562014132367,
|
647 |
+
"loss": 0.3722,
|
648 |
+
"step": 820
|
649 |
+
},
|
650 |
+
{
|
651 |
+
"epoch": 2.3249299719887957,
|
652 |
+
"grad_norm": 1.5457279682159424,
|
653 |
+
"learning_rate": 0.00011670908450461421,
|
654 |
+
"loss": 0.3836,
|
655 |
+
"step": 830
|
656 |
+
},
|
657 |
+
{
|
658 |
+
"epoch": 2.3529411764705883,
|
659 |
+
"grad_norm": 1.914865255355835,
|
660 |
+
"learning_rate": 0.00011491704527446705,
|
661 |
+
"loss": 0.3929,
|
662 |
+
"step": 840
|
663 |
+
},
|
664 |
+
{
|
665 |
+
"epoch": 2.380952380952381,
|
666 |
+
"grad_norm": 1.9673585891723633,
|
667 |
+
"learning_rate": 0.00011312009270795729,
|
668 |
+
"loss": 0.3736,
|
669 |
+
"step": 850
|
670 |
+
},
|
671 |
+
{
|
672 |
+
"epoch": 2.4089635854341735,
|
673 |
+
"grad_norm": 1.508853554725647,
|
674 |
+
"learning_rate": 0.00011131881868050152,
|
675 |
+
"loss": 0.3818,
|
676 |
+
"step": 860
|
677 |
+
},
|
678 |
+
{
|
679 |
+
"epoch": 2.4369747899159666,
|
680 |
+
"grad_norm": 1.633651852607727,
|
681 |
+
"learning_rate": 0.00010951381649090766,
|
682 |
+
"loss": 0.3706,
|
683 |
+
"step": 870
|
684 |
+
},
|
685 |
+
{
|
686 |
+
"epoch": 2.4649859943977592,
|
687 |
+
"grad_norm": 1.7917840480804443,
|
688 |
+
"learning_rate": 0.00010770568066595556,
|
689 |
+
"loss": 0.372,
|
690 |
+
"step": 880
|
691 |
+
},
|
692 |
+
{
|
693 |
+
"epoch": 2.492997198879552,
|
694 |
+
"grad_norm": 1.7907308340072632,
|
695 |
+
"learning_rate": 0.00010589500676457358,
|
696 |
+
"loss": 0.392,
|
697 |
+
"step": 890
|
698 |
+
},
|
699 |
+
{
|
700 |
+
"epoch": 2.5210084033613445,
|
701 |
+
"grad_norm": 1.6397713422775269,
|
702 |
+
"learning_rate": 0.00010408239118167482,
|
703 |
+
"loss": 0.3866,
|
704 |
+
"step": 900
|
705 |
+
},
|
706 |
+
{
|
707 |
+
"epoch": 2.5210084033613445,
|
708 |
+
"eval_loss": 1.254116177558899,
|
709 |
+
"eval_runtime": 26.2156,
|
710 |
+
"eval_samples_per_second": 13.618,
|
711 |
+
"eval_steps_per_second": 1.717,
|
712 |
+
"step": 900
|
713 |
+
},
|
714 |
+
{
|
715 |
+
"epoch": 2.549019607843137,
|
716 |
+
"grad_norm": 1.613955020904541,
|
717 |
+
"learning_rate": 0.00010226843095171827,
|
718 |
+
"loss": 0.3973,
|
719 |
+
"step": 910
|
720 |
+
},
|
721 |
+
{
|
722 |
+
"epoch": 2.5770308123249297,
|
723 |
+
"grad_norm": 1.7094041109085083,
|
724 |
+
"learning_rate": 0.00010045372355205915,
|
725 |
+
"loss": 0.3697,
|
726 |
+
"step": 920
|
727 |
+
},
|
728 |
+
{
|
729 |
+
"epoch": 2.6050420168067228,
|
730 |
+
"grad_norm": 1.6542613506317139,
|
731 |
+
"learning_rate": 9.86388667061534e-05,
|
732 |
+
"loss": 0.3773,
|
733 |
+
"step": 930
|
734 |
+
},
|
735 |
+
{
|
736 |
+
"epoch": 2.6330532212885154,
|
737 |
+
"grad_norm": 1.57974374294281,
|
738 |
+
"learning_rate": 9.68244581866812e-05,
|
739 |
+
"loss": 0.381,
|
740 |
+
"step": 940
|
741 |
+
},
|
742 |
+
{
|
743 |
+
"epoch": 2.661064425770308,
|
744 |
+
"grad_norm": 1.8591638803482056,
|
745 |
+
"learning_rate": 9.501109561865415e-05,
|
746 |
+
"loss": 0.3722,
|
747 |
+
"step": 950
|
748 |
+
},
|
749 |
+
{
|
750 |
+
"epoch": 2.689075630252101,
|
751 |
+
"grad_norm": 1.8032310009002686,
|
752 |
+
"learning_rate": 9.319937628257121e-05,
|
753 |
+
"loss": 0.3812,
|
754 |
+
"step": 960
|
755 |
+
},
|
756 |
+
{
|
757 |
+
"epoch": 2.7170868347338937,
|
758 |
+
"grad_norm": 1.6205816268920898,
|
759 |
+
"learning_rate": 9.1389896917688e-05,
|
760 |
+
"loss": 0.358,
|
761 |
+
"step": 970
|
762 |
+
},
|
763 |
+
{
|
764 |
+
"epoch": 2.7450980392156863,
|
765 |
+
"grad_norm": 1.5483547449111938,
|
766 |
+
"learning_rate": 8.958325352546433e-05,
|
767 |
+
"loss": 0.3829,
|
768 |
+
"step": 980
|
769 |
+
},
|
770 |
+
{
|
771 |
+
"epoch": 2.773109243697479,
|
772 |
+
"grad_norm": 1.7492879629135132,
|
773 |
+
"learning_rate": 8.778004117325508e-05,
|
774 |
+
"loss": 0.3832,
|
775 |
+
"step": 990
|
776 |
+
},
|
777 |
+
{
|
778 |
+
"epoch": 2.8011204481792715,
|
779 |
+
"grad_norm": 1.752061128616333,
|
780 |
+
"learning_rate": 8.598085379830831e-05,
|
781 |
+
"loss": 0.3708,
|
782 |
+
"step": 1000
|
783 |
+
},
|
784 |
+
{
|
785 |
+
"epoch": 2.8011204481792715,
|
786 |
+
"eval_loss": 1.242302417755127,
|
787 |
+
"eval_runtime": 26.0048,
|
788 |
+
"eval_samples_per_second": 13.728,
|
789 |
+
"eval_steps_per_second": 1.73,
|
790 |
+
"step": 1000
|
791 |
+
},
|
792 |
+
{
|
793 |
+
"epoch": 2.8291316526610646,
|
794 |
+
"grad_norm": 1.7680628299713135,
|
795 |
+
"learning_rate": 8.418628401213588e-05,
|
796 |
+
"loss": 0.3916,
|
797 |
+
"step": 1010
|
798 |
+
},
|
799 |
+
{
|
800 |
+
"epoch": 2.857142857142857,
|
801 |
+
"grad_norm": 2.042778253555298,
|
802 |
+
"learning_rate": 8.239692290532066e-05,
|
803 |
+
"loss": 0.375,
|
804 |
+
"step": 1020
|
805 |
+
},
|
806 |
+
{
|
807 |
+
"epoch": 2.88515406162465,
|
808 |
+
"grad_norm": 1.5636996030807495,
|
809 |
+
"learning_rate": 8.061335985282463e-05,
|
810 |
+
"loss": 0.371,
|
811 |
+
"step": 1030
|
812 |
+
},
|
813 |
+
{
|
814 |
+
"epoch": 2.9131652661064424,
|
815 |
+
"grad_norm": 2.1579694747924805,
|
816 |
+
"learning_rate": 7.883618231986223e-05,
|
817 |
+
"loss": 0.3728,
|
818 |
+
"step": 1040
|
819 |
+
},
|
820 |
+
{
|
821 |
+
"epoch": 2.9411764705882355,
|
822 |
+
"grad_norm": 1.8203998804092407,
|
823 |
+
"learning_rate": 7.706597566840258e-05,
|
824 |
+
"loss": 0.3752,
|
825 |
+
"step": 1050
|
826 |
+
},
|
827 |
+
{
|
828 |
+
"epoch": 2.969187675070028,
|
829 |
+
"grad_norm": 1.7312395572662354,
|
830 |
+
"learning_rate": 7.53033229643646e-05,
|
831 |
+
"loss": 0.3789,
|
832 |
+
"step": 1060
|
833 |
+
},
|
834 |
+
{
|
835 |
+
"epoch": 2.9971988795518207,
|
836 |
+
"grad_norm": 1.616680383682251,
|
837 |
+
"learning_rate": 7.354880478556818e-05,
|
838 |
+
"loss": 0.3682,
|
839 |
+
"step": 1070
|
840 |
+
},
|
841 |
+
{
|
842 |
+
"epoch": 3.0252100840336134,
|
843 |
+
"grad_norm": 1.6187553405761719,
|
844 |
+
"learning_rate": 7.18029990305052e-05,
|
845 |
+
"loss": 0.2418,
|
846 |
+
"step": 1080
|
847 |
+
},
|
848 |
+
{
|
849 |
+
"epoch": 3.053221288515406,
|
850 |
+
"grad_norm": 1.2909393310546875,
|
851 |
+
"learning_rate": 7.006648072799272e-05,
|
852 |
+
"loss": 0.2199,
|
853 |
+
"step": 1090
|
854 |
+
},
|
855 |
+
{
|
856 |
+
"epoch": 3.081232492997199,
|
857 |
+
"grad_norm": 1.4550460577011108,
|
858 |
+
"learning_rate": 6.833982184777165e-05,
|
859 |
+
"loss": 0.2417,
|
860 |
+
"step": 1100
|
861 |
+
},
|
862 |
+
{
|
863 |
+
"epoch": 3.081232492997199,
|
864 |
+
"eval_loss": 1.3649441003799438,
|
865 |
+
"eval_runtime": 25.9266,
|
866 |
+
"eval_samples_per_second": 13.77,
|
867 |
+
"eval_steps_per_second": 1.736,
|
868 |
+
"step": 1100
|
869 |
+
},
|
870 |
+
{
|
871 |
+
"epoch": 3.1092436974789917,
|
872 |
+
"grad_norm": 1.4254740476608276,
|
873 |
+
"learning_rate": 6.662359111211268e-05,
|
874 |
+
"loss": 0.2217,
|
875 |
+
"step": 1110
|
876 |
+
},
|
877 |
+
{
|
878 |
+
"epoch": 3.1372549019607843,
|
879 |
+
"grad_norm": 1.446631908416748,
|
880 |
+
"learning_rate": 6.491835380849238e-05,
|
881 |
+
"loss": 0.2312,
|
882 |
+
"step": 1120
|
883 |
+
},
|
884 |
+
{
|
885 |
+
"epoch": 3.165266106442577,
|
886 |
+
"grad_norm": 1.636902093887329,
|
887 |
+
"learning_rate": 6.322467160340011e-05,
|
888 |
+
"loss": 0.2215,
|
889 |
+
"step": 1130
|
890 |
+
},
|
891 |
+
{
|
892 |
+
"epoch": 3.19327731092437,
|
893 |
+
"grad_norm": 1.3956204652786255,
|
894 |
+
"learning_rate": 6.154310235733785e-05,
|
895 |
+
"loss": 0.2286,
|
896 |
+
"step": 1140
|
897 |
+
},
|
898 |
+
{
|
899 |
+
"epoch": 3.2212885154061626,
|
900 |
+
"grad_norm": 1.1625617742538452,
|
901 |
+
"learning_rate": 5.9874199941073636e-05,
|
902 |
+
"loss": 0.2402,
|
903 |
+
"step": 1150
|
904 |
+
},
|
905 |
+
{
|
906 |
+
"epoch": 3.249299719887955,
|
907 |
+
"grad_norm": 1.3581691980361938,
|
908 |
+
"learning_rate": 5.821851405320903e-05,
|
909 |
+
"loss": 0.2313,
|
910 |
+
"step": 1160
|
911 |
+
},
|
912 |
+
{
|
913 |
+
"epoch": 3.277310924369748,
|
914 |
+
"grad_norm": 1.2999801635742188,
|
915 |
+
"learning_rate": 5.657659003912104e-05,
|
916 |
+
"loss": 0.2251,
|
917 |
+
"step": 1170
|
918 |
+
},
|
919 |
+
{
|
920 |
+
"epoch": 3.3053221288515404,
|
921 |
+
"grad_norm": 1.4706475734710693,
|
922 |
+
"learning_rate": 5.494896871133743e-05,
|
923 |
+
"loss": 0.2213,
|
924 |
+
"step": 1180
|
925 |
+
},
|
926 |
+
{
|
927 |
+
"epoch": 3.3333333333333335,
|
928 |
+
"grad_norm": 1.4545742273330688,
|
929 |
+
"learning_rate": 5.3336186171405764e-05,
|
930 |
+
"loss": 0.2333,
|
931 |
+
"step": 1190
|
932 |
+
},
|
933 |
+
{
|
934 |
+
"epoch": 3.361344537815126,
|
935 |
+
"grad_norm": 1.2404979467391968,
|
936 |
+
"learning_rate": 5.173877363331311e-05,
|
937 |
+
"loss": 0.2211,
|
938 |
+
"step": 1200
|
939 |
+
},
|
940 |
+
{
|
941 |
+
"epoch": 3.361344537815126,
|
942 |
+
"eval_loss": 1.4273622035980225,
|
943 |
+
"eval_runtime": 26.0864,
|
944 |
+
"eval_samples_per_second": 13.685,
|
945 |
+
"eval_steps_per_second": 1.725,
|
946 |
+
"step": 1200
|
947 |
+
},
|
948 |
+
{
|
949 |
+
"epoch": 3.3893557422969187,
|
950 |
+
"grad_norm": 1.3979793787002563,
|
951 |
+
"learning_rate": 5.0157257248516896e-05,
|
952 |
+
"loss": 0.2291,
|
953 |
+
"step": 1210
|
954 |
+
},
|
955 |
+
{
|
956 |
+
"epoch": 3.4173669467787113,
|
957 |
+
"grad_norm": 1.7199064493179321,
|
958 |
+
"learning_rate": 4.859215793264198e-05,
|
959 |
+
"loss": 0.229,
|
960 |
+
"step": 1220
|
961 |
+
},
|
962 |
+
{
|
963 |
+
"epoch": 3.4453781512605044,
|
964 |
+
"grad_norm": 1.4441235065460205,
|
965 |
+
"learning_rate": 4.7043991193903244e-05,
|
966 |
+
"loss": 0.2329,
|
967 |
+
"step": 1230
|
968 |
+
},
|
969 |
+
{
|
970 |
+
"epoch": 3.473389355742297,
|
971 |
+
"grad_norm": 1.363365650177002,
|
972 |
+
"learning_rate": 4.5513266963308654e-05,
|
973 |
+
"loss": 0.2308,
|
974 |
+
"step": 1240
|
975 |
+
},
|
976 |
+
{
|
977 |
+
"epoch": 3.5014005602240896,
|
978 |
+
"grad_norm": 1.415418267250061,
|
979 |
+
"learning_rate": 4.400048942669962e-05,
|
980 |
+
"loss": 0.2267,
|
981 |
+
"step": 1250
|
982 |
+
},
|
983 |
+
{
|
984 |
+
"epoch": 3.5294117647058822,
|
985 |
+
"grad_norm": 1.3714072704315186,
|
986 |
+
"learning_rate": 4.25061568586836e-05,
|
987 |
+
"loss": 0.2227,
|
988 |
+
"step": 1260
|
989 |
+
},
|
990 |
+
{
|
991 |
+
"epoch": 3.557422969187675,
|
992 |
+
"grad_norm": 1.2111363410949707,
|
993 |
+
"learning_rate": 4.10307614585139e-05,
|
994 |
+
"loss": 0.2322,
|
995 |
+
"step": 1270
|
996 |
+
},
|
997 |
+
{
|
998 |
+
"epoch": 3.585434173669468,
|
999 |
+
"grad_norm": 1.3968279361724854,
|
1000 |
+
"learning_rate": 3.957478918797013e-05,
|
1001 |
+
"loss": 0.225,
|
1002 |
+
"step": 1280
|
1003 |
+
},
|
1004 |
+
{
|
1005 |
+
"epoch": 3.6134453781512605,
|
1006 |
+
"grad_norm": 1.4936352968215942,
|
1007 |
+
"learning_rate": 3.8138719611293595e-05,
|
1008 |
+
"loss": 0.2191,
|
1009 |
+
"step": 1290
|
1010 |
+
},
|
1011 |
+
{
|
1012 |
+
"epoch": 3.641456582633053,
|
1013 |
+
"grad_norm": 1.3193854093551636,
|
1014 |
+
"learning_rate": 3.6723025737229744e-05,
|
1015 |
+
"loss": 0.2251,
|
1016 |
+
"step": 1300
|
1017 |
+
},
|
1018 |
+
{
|
1019 |
+
"epoch": 3.641456582633053,
|
1020 |
+
"eval_loss": 1.4654343128204346,
|
1021 |
+
"eval_runtime": 26.0893,
|
1022 |
+
"eval_samples_per_second": 13.684,
|
1023 |
+
"eval_steps_per_second": 1.725,
|
1024 |
+
"step": 1300
|
1025 |
+
},
|
1026 |
+
{
|
1027 |
+
"epoch": 3.669467787114846,
|
1028 |
+
"grad_norm": 1.2740771770477295,
|
1029 |
+
"learning_rate": 3.532817386322964e-05,
|
1030 |
+
"loss": 0.2235,
|
1031 |
+
"step": 1310
|
1032 |
+
},
|
1033 |
+
{
|
1034 |
+
"epoch": 3.697478991596639,
|
1035 |
+
"grad_norm": 1.1999931335449219,
|
1036 |
+
"learning_rate": 3.395462342186244e-05,
|
1037 |
+
"loss": 0.2244,
|
1038 |
+
"step": 1320
|
1039 |
+
},
|
1040 |
+
{
|
1041 |
+
"epoch": 3.7254901960784315,
|
1042 |
+
"grad_norm": 1.4531569480895996,
|
1043 |
+
"learning_rate": 3.2602826829488296e-05,
|
1044 |
+
"loss": 0.2218,
|
1045 |
+
"step": 1330
|
1046 |
+
},
|
1047 |
+
{
|
1048 |
+
"epoch": 3.753501400560224,
|
1049 |
+
"grad_norm": 1.5755056142807007,
|
1050 |
+
"learning_rate": 3.127322933724317e-05,
|
1051 |
+
"loss": 0.2331,
|
1052 |
+
"step": 1340
|
1053 |
+
},
|
1054 |
+
{
|
1055 |
+
"epoch": 3.7815126050420167,
|
1056 |
+
"grad_norm": 1.2473876476287842,
|
1057 |
+
"learning_rate": 2.9966268884382808e-05,
|
1058 |
+
"loss": 0.2171,
|
1059 |
+
"step": 1350
|
1060 |
+
},
|
1061 |
+
{
|
1062 |
+
"epoch": 3.8095238095238093,
|
1063 |
+
"grad_norm": 1.3828644752502441,
|
1064 |
+
"learning_rate": 2.8682375954035756e-05,
|
1065 |
+
"loss": 0.2269,
|
1066 |
+
"step": 1360
|
1067 |
+
},
|
1068 |
+
{
|
1069 |
+
"epoch": 3.8375350140056024,
|
1070 |
+
"grad_norm": 1.473433494567871,
|
1071 |
+
"learning_rate": 2.7421973431411775e-05,
|
1072 |
+
"loss": 0.2121,
|
1073 |
+
"step": 1370
|
1074 |
+
},
|
1075 |
+
{
|
1076 |
+
"epoch": 3.865546218487395,
|
1077 |
+
"grad_norm": 1.1347614526748657,
|
1078 |
+
"learning_rate": 2.618547646451317e-05,
|
1079 |
+
"loss": 0.2187,
|
1080 |
+
"step": 1380
|
1081 |
+
},
|
1082 |
+
{
|
1083 |
+
"epoch": 3.8935574229691876,
|
1084 |
+
"grad_norm": 1.4064631462097168,
|
1085 |
+
"learning_rate": 2.497329232739438e-05,
|
1086 |
+
"loss": 0.2243,
|
1087 |
+
"step": 1390
|
1088 |
+
},
|
1089 |
+
{
|
1090 |
+
"epoch": 3.9215686274509802,
|
1091 |
+
"grad_norm": 1.394667148590088,
|
1092 |
+
"learning_rate": 2.378582028601529e-05,
|
1093 |
+
"loss": 0.222,
|
1094 |
+
"step": 1400
|
1095 |
+
},
|
1096 |
+
{
|
1097 |
+
"epoch": 3.9215686274509802,
|
1098 |
+
"eval_loss": 1.474218487739563,
|
1099 |
+
"eval_runtime": 26.0178,
|
1100 |
+
"eval_samples_per_second": 13.721,
|
1101 |
+
"eval_steps_per_second": 1.73,
|
1102 |
+
"step": 1400
|
1103 |
+
},
|
1104 |
+
{
|
1105 |
+
"epoch": 3.9495798319327733,
|
1106 |
+
"grad_norm": 1.4565016031265259,
|
1107 |
+
"learning_rate": 2.262345146673186e-05,
|
1108 |
+
"loss": 0.2228,
|
1109 |
+
"step": 1410
|
1110 |
+
},
|
1111 |
+
{
|
1112 |
+
"epoch": 3.977591036414566,
|
1113 |
+
"grad_norm": 0.9852085113525391,
|
1114 |
+
"learning_rate": 2.1486568727468115e-05,
|
1115 |
+
"loss": 0.2195,
|
1116 |
+
"step": 1420
|
1117 |
+
},
|
1118 |
+
{
|
1119 |
+
"epoch": 4.0056022408963585,
|
1120 |
+
"grad_norm": 0.557214617729187,
|
1121 |
+
"learning_rate": 2.0375546531611455e-05,
|
1122 |
+
"loss": 0.2129,
|
1123 |
+
"step": 1430
|
1124 |
+
},
|
1125 |
+
{
|
1126 |
+
"epoch": 4.033613445378151,
|
1127 |
+
"grad_norm": 0.5626720786094666,
|
1128 |
+
"learning_rate": 1.9290750824672766e-05,
|
1129 |
+
"loss": 0.169,
|
1130 |
+
"step": 1440
|
1131 |
+
},
|
1132 |
+
{
|
1133 |
+
"epoch": 4.061624649859944,
|
1134 |
+
"grad_norm": 0.6319034695625305,
|
1135 |
+
"learning_rate": 1.823253891375247e-05,
|
1136 |
+
"loss": 0.1704,
|
1137 |
+
"step": 1450
|
1138 |
+
},
|
1139 |
+
{
|
1140 |
+
"epoch": 4.089635854341736,
|
1141 |
+
"grad_norm": 1.131064534187317,
|
1142 |
+
"learning_rate": 1.7201259349851416e-05,
|
1143 |
+
"loss": 0.167,
|
1144 |
+
"step": 1460
|
1145 |
+
},
|
1146 |
+
{
|
1147 |
+
"epoch": 4.117647058823529,
|
1148 |
+
"grad_norm": 0.8115044832229614,
|
1149 |
+
"learning_rate": 1.619725181306637e-05,
|
1150 |
+
"loss": 0.1697,
|
1151 |
+
"step": 1470
|
1152 |
+
},
|
1153 |
+
{
|
1154 |
+
"epoch": 4.1456582633053225,
|
1155 |
+
"grad_norm": 0.8886174559593201,
|
1156 |
+
"learning_rate": 1.5220847000706839e-05,
|
1157 |
+
"loss": 0.1724,
|
1158 |
+
"step": 1480
|
1159 |
+
},
|
1160 |
+
{
|
1161 |
+
"epoch": 4.173669467787115,
|
1162 |
+
"grad_norm": 0.7718709707260132,
|
1163 |
+
"learning_rate": 1.4272366518371093e-05,
|
1164 |
+
"loss": 0.1701,
|
1165 |
+
"step": 1490
|
1166 |
+
},
|
1167 |
+
{
|
1168 |
+
"epoch": 4.201680672268908,
|
1169 |
+
"grad_norm": 0.8207479119300842,
|
1170 |
+
"learning_rate": 1.3352122774016407e-05,
|
1171 |
+
"loss": 0.1658,
|
1172 |
+
"step": 1500
|
1173 |
+
},
|
1174 |
+
{
|
1175 |
+
"epoch": 4.201680672268908,
|
1176 |
+
"eval_loss": 1.7015750408172607,
|
1177 |
+
"eval_runtime": 26.1361,
|
1178 |
+
"eval_samples_per_second": 13.659,
|
1179 |
+
"eval_steps_per_second": 1.722,
|
1180 |
+
"step": 1500
|
1181 |
+
}
|
1182 |
+
],
|
1183 |
+
"logging_steps": 10,
|
1184 |
+
"max_steps": 1785,
|
1185 |
+
"num_input_tokens_seen": 0,
|
1186 |
+
"num_train_epochs": 5,
|
1187 |
+
"save_steps": 100,
|
1188 |
+
"stateful_callbacks": {
|
1189 |
+
"EarlyStoppingCallback": {
|
1190 |
+
"args": {
|
1191 |
+
"early_stopping_patience": 12,
|
1192 |
+
"early_stopping_threshold": 0.0
|
1193 |
+
},
|
1194 |
+
"attributes": {
|
1195 |
+
"early_stopping_patience_counter": 12
|
1196 |
+
}
|
1197 |
+
},
|
1198 |
+
"TrainerControl": {
|
1199 |
+
"args": {
|
1200 |
+
"should_epoch_stop": false,
|
1201 |
+
"should_evaluate": false,
|
1202 |
+
"should_log": false,
|
1203 |
+
"should_save": true,
|
1204 |
+
"should_training_stop": true
|
1205 |
+
},
|
1206 |
+
"attributes": {}
|
1207 |
+
}
|
1208 |
+
},
|
1209 |
+
"total_flos": 1.0091048497210982e+17,
|
1210 |
+
"train_batch_size": 8,
|
1211 |
+
"trial_name": null,
|
1212 |
+
"trial_params": null
|
1213 |
+
}
|
checkpoint-1500/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
|
3 |
+
size 5368
|
checkpoint-300/README.md
ADDED
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mistralai/Mistral-7B-Instruct-v0.3
|
3 |
+
library_name: peft
|
4 |
+
---
|
5 |
+
|
6 |
+
# Model Card for Model ID
|
7 |
+
|
8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
|
14 |
+
### Model Description
|
15 |
+
|
16 |
+
<!-- Provide a longer summary of what this model is. -->
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
- **Developed by:** [More Information Needed]
|
21 |
+
- **Funded by [optional]:** [More Information Needed]
|
22 |
+
- **Shared by [optional]:** [More Information Needed]
|
23 |
+
- **Model type:** [More Information Needed]
|
24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
25 |
+
- **License:** [More Information Needed]
|
26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
+
|
28 |
+
### Model Sources [optional]
|
29 |
+
|
30 |
+
<!-- Provide the basic links for the model. -->
|
31 |
+
|
32 |
+
- **Repository:** [More Information Needed]
|
33 |
+
- **Paper [optional]:** [More Information Needed]
|
34 |
+
- **Demo [optional]:** [More Information Needed]
|
35 |
+
|
36 |
+
## Uses
|
37 |
+
|
38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
+
|
40 |
+
### Direct Use
|
41 |
+
|
42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
+
|
44 |
+
[More Information Needed]
|
45 |
+
|
46 |
+
### Downstream Use [optional]
|
47 |
+
|
48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
+
|
50 |
+
[More Information Needed]
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
+
|
56 |
+
[More Information Needed]
|
57 |
+
|
58 |
+
## Bias, Risks, and Limitations
|
59 |
+
|
60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
+
|
62 |
+
[More Information Needed]
|
63 |
+
|
64 |
+
### Recommendations
|
65 |
+
|
66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
+
|
68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
+
|
70 |
+
## How to Get Started with the Model
|
71 |
+
|
72 |
+
Use the code below to get started with the model.
|
73 |
+
|
74 |
+
[More Information Needed]
|
75 |
+
|
76 |
+
## Training Details
|
77 |
+
|
78 |
+
### Training Data
|
79 |
+
|
80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
+
|
82 |
+
[More Information Needed]
|
83 |
+
|
84 |
+
### Training Procedure
|
85 |
+
|
86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
+
|
88 |
+
#### Preprocessing [optional]
|
89 |
+
|
90 |
+
[More Information Needed]
|
91 |
+
|
92 |
+
|
93 |
+
#### Training Hyperparameters
|
94 |
+
|
95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
+
|
97 |
+
#### Speeds, Sizes, Times [optional]
|
98 |
+
|
99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
+
|
101 |
+
[More Information Needed]
|
102 |
+
|
103 |
+
## Evaluation
|
104 |
+
|
105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
+
|
107 |
+
### Testing Data, Factors & Metrics
|
108 |
+
|
109 |
+
#### Testing Data
|
110 |
+
|
111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
112 |
+
|
113 |
+
[More Information Needed]
|
114 |
+
|
115 |
+
#### Factors
|
116 |
+
|
117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
+
|
119 |
+
[More Information Needed]
|
120 |
+
|
121 |
+
#### Metrics
|
122 |
+
|
123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
+
|
125 |
+
[More Information Needed]
|
126 |
+
|
127 |
+
### Results
|
128 |
+
|
129 |
+
[More Information Needed]
|
130 |
+
|
131 |
+
#### Summary
|
132 |
+
|
133 |
+
|
134 |
+
|
135 |
+
## Model Examination [optional]
|
136 |
+
|
137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
138 |
+
|
139 |
+
[More Information Needed]
|
140 |
+
|
141 |
+
## Environmental Impact
|
142 |
+
|
143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
+
|
145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
+
|
147 |
+
- **Hardware Type:** [More Information Needed]
|
148 |
+
- **Hours used:** [More Information Needed]
|
149 |
+
- **Cloud Provider:** [More Information Needed]
|
150 |
+
- **Compute Region:** [More Information Needed]
|
151 |
+
- **Carbon Emitted:** [More Information Needed]
|
152 |
+
|
153 |
+
## Technical Specifications [optional]
|
154 |
+
|
155 |
+
### Model Architecture and Objective
|
156 |
+
|
157 |
+
[More Information Needed]
|
158 |
+
|
159 |
+
### Compute Infrastructure
|
160 |
+
|
161 |
+
[More Information Needed]
|
162 |
+
|
163 |
+
#### Hardware
|
164 |
+
|
165 |
+
[More Information Needed]
|
166 |
+
|
167 |
+
#### Software
|
168 |
+
|
169 |
+
[More Information Needed]
|
170 |
+
|
171 |
+
## Citation [optional]
|
172 |
+
|
173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
+
|
175 |
+
**BibTeX:**
|
176 |
+
|
177 |
+
[More Information Needed]
|
178 |
+
|
179 |
+
**APA:**
|
180 |
+
|
181 |
+
[More Information Needed]
|
182 |
+
|
183 |
+
## Glossary [optional]
|
184 |
+
|
185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
+
|
187 |
+
[More Information Needed]
|
188 |
+
|
189 |
+
## More Information [optional]
|
190 |
+
|
191 |
+
[More Information Needed]
|
192 |
+
|
193 |
+
## Model Card Authors [optional]
|
194 |
+
|
195 |
+
[More Information Needed]
|
196 |
+
|
197 |
+
## Model Card Contact
|
198 |
+
|
199 |
+
[More Information Needed]
|
200 |
+
### Framework versions
|
201 |
+
|
202 |
+
- PEFT 0.14.0
|
checkpoint-300/adapter_config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"alpha_pattern": {},
|
3 |
+
"auto_mapping": null,
|
4 |
+
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
|
5 |
+
"bias": "none",
|
6 |
+
"eva_config": null,
|
7 |
+
"exclude_modules": null,
|
8 |
+
"fan_in_fan_out": false,
|
9 |
+
"inference_mode": true,
|
10 |
+
"init_lora_weights": true,
|
11 |
+
"layer_replication": null,
|
12 |
+
"layers_pattern": null,
|
13 |
+
"layers_to_transform": null,
|
14 |
+
"loftq_config": {},
|
15 |
+
"lora_alpha": 32,
|
16 |
+
"lora_bias": false,
|
17 |
+
"lora_dropout": 0.05,
|
18 |
+
"megatron_config": null,
|
19 |
+
"megatron_core": "megatron.core",
|
20 |
+
"modules_to_save": null,
|
21 |
+
"peft_type": "LORA",
|
22 |
+
"r": 16,
|
23 |
+
"rank_pattern": {},
|
24 |
+
"revision": null,
|
25 |
+
"target_modules": [
|
26 |
+
"o_proj",
|
27 |
+
"up_proj",
|
28 |
+
"gate_proj",
|
29 |
+
"q_proj",
|
30 |
+
"k_proj",
|
31 |
+
"v_proj",
|
32 |
+
"down_proj"
|
33 |
+
],
|
34 |
+
"task_type": "CAUSAL_LM",
|
35 |
+
"use_dora": false,
|
36 |
+
"use_rslora": false
|
37 |
+
}
|
checkpoint-300/adapter_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:67780d5284cd1c059d99826e0cec6b5f8e1aa17cc818632dbf1f772a9ed7c594
|
3 |
+
size 167832240
|
checkpoint-300/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0db96968c04f4ec5a8e7abe67b907d7cadc0f99006087d87d15448de890f648e
|
3 |
+
size 335922386
|
checkpoint-300/rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:47e2457cae0016bbcda384d81f2b91e8f4e899f8dadfcdfdf008f8936e92324e
|
3 |
+
size 14244
|
checkpoint-300/scaler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6206ce0a70f6ce599823529d33dc15bcfed77431fdfb979429d85dfc33e2eede
|
3 |
+
size 988
|
checkpoint-300/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e8580816c0ed7ca6e8664cb2fbfe83f21883509719cd34d37cc3c06e8020236a
|
3 |
+
size 1064
|
checkpoint-300/special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "</s>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<unk>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": false,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
checkpoint-300/tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-300/tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
|
3 |
+
size 587404
|
checkpoint-300/tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
checkpoint-300/trainer_state.json
ADDED
@@ -0,0 +1,277 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_global_step": 300,
|
3 |
+
"best_metric": 1.0371309518814087,
|
4 |
+
"best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
|
5 |
+
"epoch": 0.8403361344537815,
|
6 |
+
"eval_steps": 100,
|
7 |
+
"global_step": 300,
|
8 |
+
"is_hyper_param_search": false,
|
9 |
+
"is_local_process_zero": true,
|
10 |
+
"is_world_process_zero": true,
|
11 |
+
"log_history": [
|
12 |
+
{
|
13 |
+
"epoch": 0.028011204481792718,
|
14 |
+
"grad_norm": 5.871906757354736,
|
15 |
+
"learning_rate": 2.962962962962963e-05,
|
16 |
+
"loss": 2.3552,
|
17 |
+
"step": 10
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"epoch": 0.056022408963585436,
|
21 |
+
"grad_norm": 2.389425039291382,
|
22 |
+
"learning_rate": 6.296296296296296e-05,
|
23 |
+
"loss": 1.7094,
|
24 |
+
"step": 20
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"epoch": 0.08403361344537816,
|
28 |
+
"grad_norm": 2.160494804382324,
|
29 |
+
"learning_rate": 0.0001,
|
30 |
+
"loss": 1.4552,
|
31 |
+
"step": 30
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"epoch": 0.11204481792717087,
|
35 |
+
"grad_norm": 2.3963708877563477,
|
36 |
+
"learning_rate": 0.00013703703703703705,
|
37 |
+
"loss": 1.2928,
|
38 |
+
"step": 40
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"epoch": 0.1400560224089636,
|
42 |
+
"grad_norm": 2.192927598953247,
|
43 |
+
"learning_rate": 0.00017407407407407408,
|
44 |
+
"loss": 1.1542,
|
45 |
+
"step": 50
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"epoch": 0.16806722689075632,
|
49 |
+
"grad_norm": 1.6872493028640747,
|
50 |
+
"learning_rate": 0.00019999851776504612,
|
51 |
+
"loss": 1.1238,
|
52 |
+
"step": 60
|
53 |
+
},
|
54 |
+
{
|
55 |
+
"epoch": 0.19607843137254902,
|
56 |
+
"grad_norm": 1.8370214700698853,
|
57 |
+
"learning_rate": 0.00019997216814377874,
|
58 |
+
"loss": 1.1234,
|
59 |
+
"step": 70
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"epoch": 0.22408963585434175,
|
63 |
+
"grad_norm": 1.7727515697479248,
|
64 |
+
"learning_rate": 0.0001999128899579094,
|
65 |
+
"loss": 1.1143,
|
66 |
+
"step": 80
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"epoch": 0.25210084033613445,
|
70 |
+
"grad_norm": 1.5649667978286743,
|
71 |
+
"learning_rate": 0.00019982070273232797,
|
72 |
+
"loss": 1.112,
|
73 |
+
"step": 90
|
74 |
+
},
|
75 |
+
{
|
76 |
+
"epoch": 0.2801120448179272,
|
77 |
+
"grad_norm": 1.5107511281967163,
|
78 |
+
"learning_rate": 0.0001996956368314155,
|
79 |
+
"loss": 1.1154,
|
80 |
+
"step": 100
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"epoch": 0.2801120448179272,
|
84 |
+
"eval_loss": 1.0852699279785156,
|
85 |
+
"eval_runtime": 25.3067,
|
86 |
+
"eval_samples_per_second": 14.107,
|
87 |
+
"eval_steps_per_second": 1.778,
|
88 |
+
"step": 100
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"epoch": 0.3081232492997199,
|
92 |
+
"grad_norm": 1.5419102907180786,
|
93 |
+
"learning_rate": 0.00019953773344904302,
|
94 |
+
"loss": 1.0686,
|
95 |
+
"step": 110
|
96 |
+
},
|
97 |
+
{
|
98 |
+
"epoch": 0.33613445378151263,
|
99 |
+
"grad_norm": 1.5007447004318237,
|
100 |
+
"learning_rate": 0.00019934704459500312,
|
101 |
+
"loss": 1.0849,
|
102 |
+
"step": 120
|
103 |
+
},
|
104 |
+
{
|
105 |
+
"epoch": 0.3641456582633053,
|
106 |
+
"grad_norm": 1.67251455783844,
|
107 |
+
"learning_rate": 0.000199123633077879,
|
108 |
+
"loss": 1.0793,
|
109 |
+
"step": 130
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"epoch": 0.39215686274509803,
|
113 |
+
"grad_norm": 1.4843816757202148,
|
114 |
+
"learning_rate": 0.00019886757248435704,
|
115 |
+
"loss": 1.0122,
|
116 |
+
"step": 140
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"epoch": 0.42016806722689076,
|
120 |
+
"grad_norm": 1.4718296527862549,
|
121 |
+
"learning_rate": 0.00019857894715498873,
|
122 |
+
"loss": 1.0423,
|
123 |
+
"step": 150
|
124 |
+
},
|
125 |
+
{
|
126 |
+
"epoch": 0.4481792717086835,
|
127 |
+
"grad_norm": 1.4704537391662598,
|
128 |
+
"learning_rate": 0.00019825785215641086,
|
129 |
+
"loss": 1.0524,
|
130 |
+
"step": 160
|
131 |
+
},
|
132 |
+
{
|
133 |
+
"epoch": 0.47619047619047616,
|
134 |
+
"grad_norm": 1.4213836193084717,
|
135 |
+
"learning_rate": 0.0001979043932500329,
|
136 |
+
"loss": 1.0373,
|
137 |
+
"step": 170
|
138 |
+
},
|
139 |
+
{
|
140 |
+
"epoch": 0.5042016806722689,
|
141 |
+
"grad_norm": 1.579439401626587,
|
142 |
+
"learning_rate": 0.00019751868685720134,
|
143 |
+
"loss": 1.0539,
|
144 |
+
"step": 180
|
145 |
+
},
|
146 |
+
{
|
147 |
+
"epoch": 0.5322128851540616,
|
148 |
+
"grad_norm": 1.4878956079483032,
|
149 |
+
"learning_rate": 0.00019710086002085335,
|
150 |
+
"loss": 1.0315,
|
151 |
+
"step": 190
|
152 |
+
},
|
153 |
+
{
|
154 |
+
"epoch": 0.5602240896358543,
|
155 |
+
"grad_norm": 1.5998858213424683,
|
156 |
+
"learning_rate": 0.0001966510503636716,
|
157 |
+
"loss": 1.038,
|
158 |
+
"step": 200
|
159 |
+
},
|
160 |
+
{
|
161 |
+
"epoch": 0.5602240896358543,
|
162 |
+
"eval_loss": 1.0578807592391968,
|
163 |
+
"eval_runtime": 25.2375,
|
164 |
+
"eval_samples_per_second": 14.146,
|
165 |
+
"eval_steps_per_second": 1.783,
|
166 |
+
"step": 200
|
167 |
+
},
|
168 |
+
{
|
169 |
+
"epoch": 0.5882352941176471,
|
170 |
+
"grad_norm": 1.5214334726333618,
|
171 |
+
"learning_rate": 0.00019616940604275454,
|
172 |
+
"loss": 1.0515,
|
173 |
+
"step": 210
|
174 |
+
},
|
175 |
+
{
|
176 |
+
"epoch": 0.6162464985994398,
|
177 |
+
"grad_norm": 1.7388910055160522,
|
178 |
+
"learning_rate": 0.00019565608570081672,
|
179 |
+
"loss": 1.07,
|
180 |
+
"step": 220
|
181 |
+
},
|
182 |
+
{
|
183 |
+
"epoch": 0.6442577030812325,
|
184 |
+
"grad_norm": 1.5043017864227295,
|
185 |
+
"learning_rate": 0.00019511125841393558,
|
186 |
+
"loss": 1.0545,
|
187 |
+
"step": 230
|
188 |
+
},
|
189 |
+
{
|
190 |
+
"epoch": 0.6722689075630253,
|
191 |
+
"grad_norm": 1.5737371444702148,
|
192 |
+
"learning_rate": 0.00019453510363586155,
|
193 |
+
"loss": 1.0546,
|
194 |
+
"step": 240
|
195 |
+
},
|
196 |
+
{
|
197 |
+
"epoch": 0.7002801120448179,
|
198 |
+
"grad_norm": 1.5473017692565918,
|
199 |
+
"learning_rate": 0.00019392781113891007,
|
200 |
+
"loss": 1.0306,
|
201 |
+
"step": 250
|
202 |
+
},
|
203 |
+
{
|
204 |
+
"epoch": 0.7282913165266106,
|
205 |
+
"grad_norm": 1.6050424575805664,
|
206 |
+
"learning_rate": 0.00019328958095145496,
|
207 |
+
"loss": 1.0285,
|
208 |
+
"step": 260
|
209 |
+
},
|
210 |
+
{
|
211 |
+
"epoch": 0.7563025210084033,
|
212 |
+
"grad_norm": 1.4448118209838867,
|
213 |
+
"learning_rate": 0.00019262062329204356,
|
214 |
+
"loss": 1.0444,
|
215 |
+
"step": 270
|
216 |
+
},
|
217 |
+
{
|
218 |
+
"epoch": 0.7843137254901961,
|
219 |
+
"grad_norm": 1.7479037046432495,
|
220 |
+
"learning_rate": 0.0001919211585001554,
|
221 |
+
"loss": 1.0341,
|
222 |
+
"step": 280
|
223 |
+
},
|
224 |
+
{
|
225 |
+
"epoch": 0.8123249299719888,
|
226 |
+
"grad_norm": 1.4101306200027466,
|
227 |
+
"learning_rate": 0.00019119141696362759,
|
228 |
+
"loss": 1.0244,
|
229 |
+
"step": 290
|
230 |
+
},
|
231 |
+
{
|
232 |
+
"epoch": 0.8403361344537815,
|
233 |
+
"grad_norm": 1.4832830429077148,
|
234 |
+
"learning_rate": 0.00019043163904277016,
|
235 |
+
"loss": 1.0407,
|
236 |
+
"step": 300
|
237 |
+
},
|
238 |
+
{
|
239 |
+
"epoch": 0.8403361344537815,
|
240 |
+
"eval_loss": 1.0371309518814087,
|
241 |
+
"eval_runtime": 25.2977,
|
242 |
+
"eval_samples_per_second": 14.112,
|
243 |
+
"eval_steps_per_second": 1.779,
|
244 |
+
"step": 300
|
245 |
+
}
|
246 |
+
],
|
247 |
+
"logging_steps": 10,
|
248 |
+
"max_steps": 1785,
|
249 |
+
"num_input_tokens_seen": 0,
|
250 |
+
"num_train_epochs": 5,
|
251 |
+
"save_steps": 100,
|
252 |
+
"stateful_callbacks": {
|
253 |
+
"EarlyStoppingCallback": {
|
254 |
+
"args": {
|
255 |
+
"early_stopping_patience": 12,
|
256 |
+
"early_stopping_threshold": 0.0
|
257 |
+
},
|
258 |
+
"attributes": {
|
259 |
+
"early_stopping_patience_counter": 0
|
260 |
+
}
|
261 |
+
},
|
262 |
+
"TrainerControl": {
|
263 |
+
"args": {
|
264 |
+
"should_epoch_stop": false,
|
265 |
+
"should_evaluate": false,
|
266 |
+
"should_log": false,
|
267 |
+
"should_save": true,
|
268 |
+
"should_training_stop": false
|
269 |
+
},
|
270 |
+
"attributes": {}
|
271 |
+
}
|
272 |
+
},
|
273 |
+
"total_flos": 2.004525617774592e+16,
|
274 |
+
"train_batch_size": 8,
|
275 |
+
"trial_name": null,
|
276 |
+
"trial_params": null
|
277 |
+
}
|
checkpoint-300/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
|
3 |
+
size 5368
|
config.json
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"MistralForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_dropout": 0.0,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"head_dim": 128,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 4096,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 14336,
|
13 |
+
"max_position_embeddings": 32768,
|
14 |
+
"model_type": "mistral",
|
15 |
+
"num_attention_heads": 32,
|
16 |
+
"num_hidden_layers": 32,
|
17 |
+
"num_key_value_heads": 8,
|
18 |
+
"quantization_config": {
|
19 |
+
"_load_in_4bit": false,
|
20 |
+
"_load_in_8bit": true,
|
21 |
+
"bnb_4bit_compute_dtype": "float32",
|
22 |
+
"bnb_4bit_quant_storage": "uint8",
|
23 |
+
"bnb_4bit_quant_type": "fp4",
|
24 |
+
"bnb_4bit_use_double_quant": false,
|
25 |
+
"llm_int8_enable_fp32_cpu_offload": false,
|
26 |
+
"llm_int8_has_fp16_weight": false,
|
27 |
+
"llm_int8_skip_modules": null,
|
28 |
+
"llm_int8_threshold": 6.0,
|
29 |
+
"load_in_4bit": false,
|
30 |
+
"load_in_8bit": true,
|
31 |
+
"quant_method": "bitsandbytes"
|
32 |
+
},
|
33 |
+
"rms_norm_eps": 1e-05,
|
34 |
+
"rope_theta": 1000000.0,
|
35 |
+
"sliding_window": null,
|
36 |
+
"tie_word_embeddings": false,
|
37 |
+
"torch_dtype": "float32",
|
38 |
+
"transformers_version": "4.51.1",
|
39 |
+
"use_cache": true,
|
40 |
+
"vocab_size": 32768
|
41 |
+
}
|
evaluation_results.json
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"overall": {
|
3 |
+
"bleu": 0.07838348900432857,
|
4 |
+
"rouge1": 0.4160263728442303,
|
5 |
+
"rouge2": 0.12298112561122547,
|
6 |
+
"rougeL": 0.22705529378080125,
|
7 |
+
"bertscore_f1": 0.888710463643074,
|
8 |
+
"survey_quality": 0.8665336862809816,
|
9 |
+
"length_sim": 0.8866010588429448,
|
10 |
+
"sentence_count_sim": 0.9129999999999999,
|
11 |
+
"sentiment_sim": 0.8
|
12 |
+
},
|
13 |
+
"domain_averages": {
|
14 |
+
"healthcare": {
|
15 |
+
"bleu": 0.07762374191145698,
|
16 |
+
"rouge1": 0.42243627141751405,
|
17 |
+
"rouge2": 0.12138260144908483,
|
18 |
+
"rougeL": 0.23979926785693628,
|
19 |
+
"bertscore_f1": 0.8950019955635071,
|
20 |
+
"survey_quality": 0.8852262899250766,
|
21 |
+
"length_sim": 0.8856788697752295,
|
22 |
+
"sentence_count_sim": 0.9200000000000002,
|
23 |
+
"sentiment_sim": 0.85
|
24 |
+
},
|
25 |
+
"education": {
|
26 |
+
"bleu": 0.07919017058872602,
|
27 |
+
"rouge1": 0.4403319723577309,
|
28 |
+
"rouge2": 0.1241865105142728,
|
29 |
+
"rougeL": 0.22697034499313967,
|
30 |
+
"bertscore_f1": 0.8952235221862793,
|
31 |
+
"survey_quality": 0.9229134973562958,
|
32 |
+
"length_sim": 0.9287404920688876,
|
33 |
+
"sentence_count_sim": 0.9400000000000001,
|
34 |
+
"sentiment_sim": 0.9
|
35 |
+
},
|
36 |
+
"consumer_preferences": {
|
37 |
+
"bleu": 0.04953106502738471,
|
38 |
+
"rouge1": 0.37131249695672947,
|
39 |
+
"rouge2": 0.09227813132714946,
|
40 |
+
"rougeL": 0.1949837207349378,
|
41 |
+
"bertscore_f1": 0.879725044965744,
|
42 |
+
"survey_quality": 0.8690244742478145,
|
43 |
+
"length_sim": 0.8970734227434429,
|
44 |
+
"sentence_count_sim": 0.9600000000000002,
|
45 |
+
"sentiment_sim": 0.75
|
46 |
+
},
|
47 |
+
"technology": {
|
48 |
+
"bleu": 0.05616084930469599,
|
49 |
+
"rouge1": 0.3952421299607451,
|
50 |
+
"rouge2": 0.1009719226005579,
|
51 |
+
"rougeL": 0.21178035991009433,
|
52 |
+
"bertscore_f1": 0.8853113055229187,
|
53 |
+
"survey_quality": 0.8866657940325023,
|
54 |
+
"length_sim": 0.914997382097507,
|
55 |
+
"sentence_count_sim": 0.8950000000000001,
|
56 |
+
"sentiment_sim": 0.85
|
57 |
+
},
|
58 |
+
"social_issues": {
|
59 |
+
"bleu": 0.10838373156460235,
|
60 |
+
"rouge1": 0.44608640880018047,
|
61 |
+
"rouge2": 0.14331504500490966,
|
62 |
+
"rougeL": 0.2531005998587629,
|
63 |
+
"bertscore_f1": 0.8945880949497222,
|
64 |
+
"survey_quality": 0.8324464426080483,
|
65 |
+
"length_sim": 0.8473393278241452,
|
66 |
+
"sentence_count_sim": 0.9000000000000001,
|
67 |
+
"sentiment_sim": 0.75
|
68 |
+
},
|
69 |
+
"environment": {
|
70 |
+
"bleu": 0.09321296997822225,
|
71 |
+
"rouge1": 0.44902727106616364,
|
72 |
+
"rouge2": 0.14650465867302334,
|
73 |
+
"rougeL": 0.25364371179912665,
|
74 |
+
"bertscore_f1": 0.8934633910655976,
|
75 |
+
"survey_quality": 0.9090866835595361,
|
76 |
+
"length_sim": 0.9005933840119422,
|
77 |
+
"sentence_count_sim": 0.9266666666666667,
|
78 |
+
"sentiment_sim": 0.9
|
79 |
+
},
|
80 |
+
"finance": {
|
81 |
+
"bleu": 0.13880939950659477,
|
82 |
+
"rouge1": 0.43427252204059863,
|
83 |
+
"rouge2": 0.19057459235218183,
|
84 |
+
"rougeL": 0.2678512413926847,
|
85 |
+
"bertscore_f1": 0.8983778953552246,
|
86 |
+
"survey_quality": 0.8420706606186332,
|
87 |
+
"length_sim": 0.8912119818558993,
|
88 |
+
"sentence_count_sim": 0.835,
|
89 |
+
"sentiment_sim": 0.8
|
90 |
+
},
|
91 |
+
"work_experience": {
|
92 |
+
"bleu": 0.06487372536521903,
|
93 |
+
"rouge1": 0.4167909303194882,
|
94 |
+
"rouge2": 0.1182192780555242,
|
95 |
+
"rougeL": 0.21887790025908646,
|
96 |
+
"bertscore_f1": 0.8860084712505341,
|
97 |
+
"survey_quality": 0.8390712140097654,
|
98 |
+
"length_sim": 0.8672136420292962,
|
99 |
+
"sentence_count_sim": 0.9,
|
100 |
+
"sentiment_sim": 0.75
|
101 |
+
},
|
102 |
+
"lifestyle": {
|
103 |
+
"bleu": 0.04366261544127066,
|
104 |
+
"rouge1": 0.37809352760789394,
|
105 |
+
"rouge2": 0.08183308983704551,
|
106 |
+
"rougeL": 0.18753344087026794,
|
107 |
+
"bertscore_f1": 0.8770935416221619,
|
108 |
+
"survey_quality": 0.8502691198859227,
|
109 |
+
"length_sim": 0.8741406929911015,
|
110 |
+
"sentence_count_sim": 0.9266666666666667,
|
111 |
+
"sentiment_sim": 0.75
|
112 |
+
},
|
113 |
+
"demographics": {
|
114 |
+
"bleu": 0.07238662135511283,
|
115 |
+
"rouge1": 0.40667019791525794,
|
116 |
+
"rouge2": 0.11054542629850524,
|
117 |
+
"rougeL": 0.21601235013297587,
|
118 |
+
"bertscore_f1": 0.8823113739490509,
|
119 |
+
"survey_quality": 0.8285626865662209,
|
120 |
+
"length_sim": 0.8590213930319963,
|
121 |
+
"sentence_count_sim": 0.9266666666666665,
|
122 |
+
"sentiment_sim": 0.7
|
123 |
+
}
|
124 |
+
}
|
125 |
+
}
|
evaluation_viz/domain_comparison.png
ADDED
![]() |
evaluation_viz/metrics_correlation.png
ADDED
![]() |
evaluation_viz/overall_metrics.png
ADDED
![]() |
evaluation_viz/sample_comparisons.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Domain,Question,Reference,Generated,BLEU,ROUGE-L,Survey Quality
|
2 |
+
healthcare,How often did you get help as soon as you wanted after you pressed the call button?,"As an experienced nurse, I understand how crucial it is for patients to receive timely assistance after pressing the call button, especially those with limited English proficiency who may feel even more vulnerable. In my experience, our team strives to respond promptly to call button alerts, as we recognize that delays can lead to anxiety and discomfort. However, there are times when our response may not be immediate due to high patient volumes or unexpected emergencies. We are always working to improve our processes to ensure that every patient feels heard and receives the help they need as soon as possible. Clear communication and understanding are key, and we make every effort to ensure that language barriers do not impede the quality of care we provide.","nderstand the importance of timely responses when it comes to patient care. In my experience, patients with limited English proficiency often face delays in receiving help, especially when there are language barriers. It's crucial for healthcare providers to ensure that they have the resources and support needed to communicate effectively with patients, so they can receive the help they need promptly. This includes investing in language services and training for staff to enhance their ability to communicate effectively with patients, regardless of their language background.",0.08389948425814907,0.18867924528301885,0.8281879194630872
|
3 |
+
healthcare,How often was the area around your room quiet at night?,"As a medical doctor, I understand the importance of a good night's sleep for overall health, so I value a quiet environment at night. Typically, I would say that the area around my room is quiet most of the time, which is crucial for rest and recovery. However, there are occasional disturbances, such as outside traffic or neighborhood activity. I'm interested in exploring integrative approaches, such as sound therapy or relaxation techniques, to manage the impact of any noise. It's important for me to maintain a peaceful environment to ensure I can be alert and focused for my patients during the day.","l medical doctor, I find it crucial to maintain a peaceful environment for both my patients and myself. A quiet night is essential for restorative sleep, which is vital for overall health. In my experience, the area around my room is usually quite peaceful, allowing me to focus on my practice and rest without disturbance. However, I do notice the occasional noise from late-night activities or city life, which can be a reminder of the need for urban planning that prioritizes peace and wellness. Overall, a peaceful environment is a cornerstone of holistic health, and I make it a priority in my own life.",0.1578953325822474,0.24761904761904763,0.8255208333333334
|
4 |
+
healthcare,"Before giving you any new medicine, how often did hospital staff describe possible side effects in a way you could understand?","As a traditional medical doctor, I recognize the importance of clear communication about potential side effects when prescribing new medications. In my experience, hospital staff generally do a good job of explaining these aspects, but there is always room for improvement. Typically, they provide the necessary information, but the challenge often lies in tailoring the explanation to the patient's level of understanding. I believe it’s crucial to ensure that patients fully grasp the potential risks and benefits, as this empowers them to make informed decisions about their health. Moving forward, I’m interested in exploring how integrative approaches might enhance this communication process, perhaps by incorporating holistic perspectives that consider the patient’s overall well-being.","l medical doctor, I have always valued clear communication, especially when it comes to potential side effects of new medications. In my experience, hospital staff have consistently done a commendable job in explaining the side effects of new treatments in a way that is both understandable and comprehensive. They often use simple language and provide written materials, ensuring that patients can review the information at their own pace. However, I believe that integrating alternative treatments like sound therapy could benefit from similar communication standards. It's crucial that patients fully understand the potential benefits and risks of any treatment, whether it's a conventional medicine or an alternative therapy like sound therapy.",0.13520814569504014,0.30434782608695654,0.7927927927927928
|
5 |
+
healthcare,How often was the area around your room quiet at night?,"As a hospital administrator, ensuring a quiet and restful environment for our patients at night is one of our top priorities. We have invested in various measures, such as soundproofing, staff training, and noise reduction protocols, to minimize disturbances. While there are occasional challenges, such as emergencies or necessary medical interventions, we strive to maintain a peaceful atmosphere. Feedback from our patients generally indicates that the area around their rooms remains quiet most nights, allowing them to rest and recover effectively.","dministrator, I've always prioritized a calm environment for our patients, especially at night when they need restful sleep for recovery. However, with the hustle and bustle of a busy hospital, it can be challenging to ensure that the area around patient rooms is consistently quiet. We've implemented measures like soundproofing and staff training, but there are always areas for improvement. If the software solutions can help reduce noise levels or even monitor them, that would be a significant step forward in enhancing patient care.",0.08703668011357521,0.20359281437125748,0.9878787878787879
|
6 |
+
healthcare,"During this hospital stay, how often did doctors listen carefully to you?","As a specialist doctor, I understand the critical importance of listening carefully to patients, as it forms the foundation for accurate diagnosis and effective treatment plans. In my experience during hospital stays—whether as a patient or observing colleagues—I have always noted that the majority of doctors make a concerted effort to listen to their patients, although time constraints can sometimes be a challenge. Personally, I prioritize active listening, ensuring that my patients feel heard and understood, because it not only builds trust but also often reveals crucial information that might not come up otherwise. However, I recognize that the level of attentiveness can vary among practitioners and situations.","doctor, I've always placed a high priority on clear and effective communication, both with patients and colleagues. During this hospital stay, I found that doctors consistently took the time to listen carefully to my input. They demonstrated a genuine interest in understanding the latest pharmaceutical products and how they can enhance patient care. This collaboration is crucial for ensuring that we can make informed decisions that benefit our patients, and I appreciate the focus on shared learning and expertise. Overall, the discussions were productive and informative, highlighting the importance of continued collaboration in healthcare.",0.0,0.1951219512195122,0.8828282828282829
|
7 |
+
healthcare,How often was your pain well controlled during this hospital stay?,"As a healthcare provider, I can't personally answer this question since it pertains to a patient's experience during a hospital stay. However, I can emphasize the importance of this question in understanding patient satisfaction and the effectiveness of pain management protocols. From my perspective, consistently managing pain well is crucial for patient recovery and overall experience. In developing a telemedicine app, ensuring that patients can easily communicate their pain levels and receive timely interventions is a priority, as it can enhance their comfort and trust in their care provider.","provider focused on developing a secure telemedicine app, I deeply understand the importance of effective pain management. While I haven't experienced hospital stays myself, I often hear from patients that they struggle with controlling their pain. That's why we're developing an app that integrates secure communication between patients and healthcare providers, ensuring that pain management plans are more effective and timely. By streamlining communication, we aim to empower patients to manage their pain more efficiently, reducing the stress and discomfort associated with traditional methods.",0.0,0.21348314606741575,0.990990990990991
|
8 |
+
healthcare,How often was your pain well controlled during this hospital stay?,"As a healthcare administrator, I don't personally experience the care, but I can share insights based on patient feedback from our recent equipment upgrade. Since upgrading our inventory, we've received positive feedback about improved pain management, largely due to advanced monitoring and updated analgesic delivery systems. Patients have reported more precise pain control and quicker response times from staff, as the new equipment allows for better tracking and notification. While there's always room for improvement, these upgrades have significantly contributed to patient comfort during their stay.","administrator, I don't personally experience pain during a hospital stay, but I can share insights based on feedback from patients and staff. In our recent upgrade, we prioritized acquiring advanced equipment that could accurately monitor and manage patient pain, ensuring timely and effective interventions. From what I've heard, patients are generally more satisfied with their pain management, thanks to more accurate assessments and tailored treatment plans. Of course, there's always room for improvement, and we're continually evaluating how to further enhance our pain management strategies.",0.21364596957236429,0.3595505617977528,0.9884057971014494
|
9 |
+
healthcare,"During this hospital stay, how often did nurses explain things in a way you could understand?","As a hospital administrator, I've observed that effective communication is key to improving patient care and satisfaction. Our nurses are highly trained and generally excel at explaining medical procedures and care plans in understandable terms. However, I also recognize there are occasional challenges due to the complexity of medical information. To address this, we're exploring innovative software solutions that can provide nurses with real-time access to simplified medical explanations and visual aids, ensuring that every patient fully comprehends their care. This initiative is part of our commitment to streamline processes and enhance the overall patient experience.","dministrator, I place a high priority on clear and effective communication between healthcare providers and patients, especially when it comes to explaining complex medical information. During this hospital stay, I found that nurses consistently took the time to break down medical procedures and treatment plans in a way that was easy to understand, using simple language and even providing visual aids when necessary. This approach not only helped in reducing anxiety but also empowered patients to make informed decisions about their care. Overall, the nurses' ability to communicate effectively was commendable and contributed to a more positive experience for everyone involved.",0.0,0.21,0.925068870523416
|
10 |
+
healthcare,How often was the area around your room quiet at night?,"In our facility, maintaining a quiet and restful environment at night is a top priority since it directly impacts patient recovery and satisfaction. We have implemented several noise-reduction strategies, such as soft-closing doors and designated quiet hours, to ensure a peaceful atmosphere. However, I acknowledge that there are occasional disturbances due to necessary medical procedures or emergencies, which we try to minimize as much as possible. We continually assess and improve our practices based on patient feedback to strike the right balance between essential care and maintaining tranquility.","a large healthcare facility, I understand the importance of a peaceful environment for patients, especially at night when restful sleep is crucial for recovery. I make it a priority to ensure our facility is as quiet as possible, especially around patient rooms. Our staff is trained to minimize noise, and we've implemented soundproofing measures where necessary. While there might be occasional disturbances, our team is proactive in addressing them, ensuring that patients can rest comfortably. Overall, I'd say our facility is quite tranquil at night, which is a testament to our commitment to patient care and well-being.",0.0,0.22105263157894736,0.8910052910052911
|
11 |
+
healthcare,"Before giving you any new medicine, how often did hospital staff describe possible side effects in a way you could understand?","As a healthcare researcher with a keen interest in neural networks for medical diagnosis, I find it crucial that communication about medication and its side effects is clear and comprehensible. In my experience, hospital staff generally do a good job of explaining potential side effects, though there can be variability depending on the staff's experience and workload. It's important for medical professionals to ensure that patients fully understand the information, as this can significantly impact patient outcomes and adherence to treatment. Ideally, I would like to see more consistent use of tools, possibly even AI-driven, to standardize and improve these communications. Such tools could provide tailored explanations based on a patient’s health literacy level, which I believe would be an excellent application of neural networks in healthcare.","researcher focused on neural networks, I deeply value clear communication and transparency in healthcare. In my experience, hospital staff consistently strive to explain potential side effects of new medications in a clear and understandable manner. They often use layman's terms or provide visual aids, which greatly enhances comprehension. However, there's always room for improvement, especially in ensuring that patients fully grasp the information, which is crucial for informed decision-making. Integrating neural networks could potentially enhance this process by providing more personalized explanations based on patient data.",0.09855180689319373,0.2545454545454545,0.7395833333333334
|
evaluation_viz/survey_metrics.png
ADDED
![]() |
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.51.1"
|
6 |
+
}
|