aryashah00 commited on
Commit
f7ad3dc
·
verified ·
1 Parent(s): a540b1c

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +111 -0
  2. adapter_config.json +37 -0
  3. adapter_model.safetensors +3 -0
  4. checkpoint-1400/README.md +202 -0
  5. checkpoint-1400/adapter_config.json +37 -0
  6. checkpoint-1400/adapter_model.safetensors +3 -0
  7. checkpoint-1400/optimizer.pt +3 -0
  8. checkpoint-1400/rng_state.pth +3 -0
  9. checkpoint-1400/scaler.pt +3 -0
  10. checkpoint-1400/scheduler.pt +3 -0
  11. checkpoint-1400/special_tokens_map.json +24 -0
  12. checkpoint-1400/tokenizer.json +0 -0
  13. checkpoint-1400/tokenizer.model +3 -0
  14. checkpoint-1400/tokenizer_config.json +0 -0
  15. checkpoint-1400/trainer_state.json +1135 -0
  16. checkpoint-1400/training_args.bin +3 -0
  17. checkpoint-1500/README.md +202 -0
  18. checkpoint-1500/adapter_config.json +37 -0
  19. checkpoint-1500/adapter_model.safetensors +3 -0
  20. checkpoint-1500/optimizer.pt +3 -0
  21. checkpoint-1500/rng_state.pth +3 -0
  22. checkpoint-1500/scaler.pt +3 -0
  23. checkpoint-1500/scheduler.pt +3 -0
  24. checkpoint-1500/special_tokens_map.json +24 -0
  25. checkpoint-1500/tokenizer.json +0 -0
  26. checkpoint-1500/tokenizer.model +3 -0
  27. checkpoint-1500/tokenizer_config.json +0 -0
  28. checkpoint-1500/trainer_state.json +1213 -0
  29. checkpoint-1500/training_args.bin +3 -0
  30. checkpoint-300/README.md +202 -0
  31. checkpoint-300/adapter_config.json +37 -0
  32. checkpoint-300/adapter_model.safetensors +3 -0
  33. checkpoint-300/optimizer.pt +3 -0
  34. checkpoint-300/rng_state.pth +3 -0
  35. checkpoint-300/scaler.pt +3 -0
  36. checkpoint-300/scheduler.pt +3 -0
  37. checkpoint-300/special_tokens_map.json +24 -0
  38. checkpoint-300/tokenizer.json +0 -0
  39. checkpoint-300/tokenizer.model +3 -0
  40. checkpoint-300/tokenizer_config.json +0 -0
  41. checkpoint-300/trainer_state.json +277 -0
  42. checkpoint-300/training_args.bin +3 -0
  43. config.json +41 -0
  44. evaluation_results.json +125 -0
  45. evaluation_viz/domain_comparison.png +0 -0
  46. evaluation_viz/metrics_correlation.png +0 -0
  47. evaluation_viz/overall_metrics.png +0 -0
  48. evaluation_viz/sample_comparisons.csv +11 -0
  49. evaluation_viz/survey_metrics.png +0 -0
  50. generation_config.json +6 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3
3
+
4
+ ## Model Description
5
+
6
+ This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) optimized for generating synthetic survey responses across multiple domains. It has been instruction-tuned using a custom dataset of survey responses, with each response reflecting a specific persona.
7
+
8
+ ## Training Data
9
+
10
+ - **Dataset Size**: ~3,000 examples
11
+ - **Domains**: 10 domains including healthcare, education, etc.
12
+ - **Format**: ChatML instruction format with system and user prompts
13
+
14
+ ## Training Details
15
+
16
+ - **Base Model**: [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
17
+ - **Training Method**: Parameter-Efficient Fine-Tuning with LoRA
18
+ - **LoRA Parameters**: r=16, alpha=32, dropout=0.05
19
+ - **Training Setup**:
20
+ - Batch Size: 8
21
+ - Learning Rate: 0.0002
22
+ - Epochs: 5
23
+
24
+
25
+
26
+ ## Usage
27
+
28
+ This model is specifically designed for generating synthetic survey responses from different personas. It works best when provided with:
29
+ 1. A detailed persona description
30
+ 2. A specific survey question
31
+
32
+ ### Python Example
33
+
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ # Load model and tokenizer
38
+ model = AutoModelForCausalLM.from_pretrained("aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3", device_map="auto", trust_remote_code=True)
39
+ tokenizer = AutoTokenizer.from_pretrained("aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3", trust_remote_code=True)
40
+
41
+ # Define persona and question
42
+ persona = "A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"
43
+ question = "How often was your pain well controlled during this hospital stay?"
44
+
45
+ # Prepare prompts
46
+ system_prompt = f"You are embodying the following persona: {{persona}}"
47
+ user_prompt = f"Survey Question: {{question}}\n\nPlease provide your honest and detailed response to this question."
48
+
49
+ # Create message format
50
+ messages = [
51
+ {"role": "system", "content": system_prompt},
52
+ {"role": "user", "content": user_prompt}
53
+ ]
54
+
55
+ # Apply chat template
56
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
57
+
58
+ # Tokenize
59
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(model.device)
60
+
61
+ # Generate response
62
+ import torch
63
+ with torch.no_grad():
64
+ output_ids = model.generate(
65
+ input_ids=input_ids,
66
+ max_new_tokens=256,
67
+ temperature=0.7,
68
+ top_p=0.9,
69
+ do_sample=True
70
+ )
71
+
72
+ # Decode
73
+ output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
74
+
75
+ # Extract just the generated response
76
+ response_start = output.find(input_text) + len(input_text)
77
+ generated_response = output[response_start:].strip()
78
+
79
+ print(f"Generated response: {{generated_response}}")
80
+ ```
81
+
82
+ ### Inference API Example
83
+
84
+ ```python
85
+ import requests
86
+
87
+ API_URL = "https://api-inference.huggingface.co/models/aryashah00/survey-finetuned-Mistral-7B-Instruct-v0.3"
88
+ headers = {"Authorization": "Bearer YOUR_API_KEY"}
89
+
90
+ def query(payload):
91
+ response = requests.post(API_URL, headers=headers, json=payload)
92
+ return response.json()
93
+
94
+ messages = [
95
+ {"role": "system", "content": "You are embodying the following persona: A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"},
96
+ {"role": "user", "content": "Survey Question: How often was your pain well controlled during this hospital stay?\n\nPlease provide your honest and detailed response to this question."}
97
+ ]
98
+
99
+ output = query({"inputs": messages})
100
+ print(output)
101
+ ```
102
+
103
+ ## Limitations
104
+
105
+ - The model is optimized for survey response generation and may not perform well on other tasks
106
+ - Response quality depends on the clarity and specificity of the persona and question
107
+ - The model may occasionally generate responses that don't fully align with the given persona
108
+
109
+ ## License
110
+
111
+ This model follows the license of the base model [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3).
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "o_proj",
27
+ "up_proj",
28
+ "gate_proj",
29
+ "q_proj",
30
+ "k_proj",
31
+ "v_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67780d5284cd1c059d99826e0cec6b5f8e1aa17cc818632dbf1f772a9ed7c594
3
+ size 167832240
checkpoint-1400/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
checkpoint-1400/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "o_proj",
27
+ "up_proj",
28
+ "gate_proj",
29
+ "q_proj",
30
+ "k_proj",
31
+ "v_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-1400/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c9fe12189951c27bc4ec00f17a9b085692fbb67a7dfe542f41378f72a0120ee
3
+ size 167832240
checkpoint-1400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9dc057d2cdebfef0ca8530d4e60f72f82aa4325d54404ed7365bbd8cafee266
3
+ size 335922386
checkpoint-1400/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df43d159e9e4ceee25b89f491cc92d93845d8fad5eff3afe150b55e30ed9f075
3
+ size 14244
checkpoint-1400/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c270be34d4aa9e6eae88f85b9db4fb0a97071ab2b89975397702aa1d9b12293
3
+ size 988
checkpoint-1400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:969f48d239c448886bc74ccec632df22e96e26849a01486ea41159b4b9a7db28
3
+ size 1064
checkpoint-1400/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1400/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
checkpoint-1400/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/trainer_state.json ADDED
@@ -0,0 +1,1135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 300,
3
+ "best_metric": 1.0371309518814087,
4
+ "best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
5
+ "epoch": 3.9215686274509802,
6
+ "eval_steps": 100,
7
+ "global_step": 1400,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.028011204481792718,
14
+ "grad_norm": 5.871906757354736,
15
+ "learning_rate": 2.962962962962963e-05,
16
+ "loss": 2.3552,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.056022408963585436,
21
+ "grad_norm": 2.389425039291382,
22
+ "learning_rate": 6.296296296296296e-05,
23
+ "loss": 1.7094,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.08403361344537816,
28
+ "grad_norm": 2.160494804382324,
29
+ "learning_rate": 0.0001,
30
+ "loss": 1.4552,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.11204481792717087,
35
+ "grad_norm": 2.3963708877563477,
36
+ "learning_rate": 0.00013703703703703705,
37
+ "loss": 1.2928,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.1400560224089636,
42
+ "grad_norm": 2.192927598953247,
43
+ "learning_rate": 0.00017407407407407408,
44
+ "loss": 1.1542,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.16806722689075632,
49
+ "grad_norm": 1.6872493028640747,
50
+ "learning_rate": 0.00019999851776504612,
51
+ "loss": 1.1238,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.19607843137254902,
56
+ "grad_norm": 1.8370214700698853,
57
+ "learning_rate": 0.00019997216814377874,
58
+ "loss": 1.1234,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.22408963585434175,
63
+ "grad_norm": 1.7727515697479248,
64
+ "learning_rate": 0.0001999128899579094,
65
+ "loss": 1.1143,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.25210084033613445,
70
+ "grad_norm": 1.5649667978286743,
71
+ "learning_rate": 0.00019982070273232797,
72
+ "loss": 1.112,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.2801120448179272,
77
+ "grad_norm": 1.5107511281967163,
78
+ "learning_rate": 0.0001996956368314155,
79
+ "loss": 1.1154,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.2801120448179272,
84
+ "eval_loss": 1.0852699279785156,
85
+ "eval_runtime": 25.3067,
86
+ "eval_samples_per_second": 14.107,
87
+ "eval_steps_per_second": 1.778,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.3081232492997199,
92
+ "grad_norm": 1.5419102907180786,
93
+ "learning_rate": 0.00019953773344904302,
94
+ "loss": 1.0686,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.33613445378151263,
99
+ "grad_norm": 1.5007447004318237,
100
+ "learning_rate": 0.00019934704459500312,
101
+ "loss": 1.0849,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.3641456582633053,
106
+ "grad_norm": 1.67251455783844,
107
+ "learning_rate": 0.000199123633077879,
108
+ "loss": 1.0793,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.39215686274509803,
113
+ "grad_norm": 1.4843816757202148,
114
+ "learning_rate": 0.00019886757248435704,
115
+ "loss": 1.0122,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.42016806722689076,
120
+ "grad_norm": 1.4718296527862549,
121
+ "learning_rate": 0.00019857894715498873,
122
+ "loss": 1.0423,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.4481792717086835,
127
+ "grad_norm": 1.4704537391662598,
128
+ "learning_rate": 0.00019825785215641086,
129
+ "loss": 1.0524,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.47619047619047616,
134
+ "grad_norm": 1.4213836193084717,
135
+ "learning_rate": 0.0001979043932500329,
136
+ "loss": 1.0373,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.5042016806722689,
141
+ "grad_norm": 1.579439401626587,
142
+ "learning_rate": 0.00019751868685720134,
143
+ "loss": 1.0539,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.5322128851540616,
148
+ "grad_norm": 1.4878956079483032,
149
+ "learning_rate": 0.00019710086002085335,
150
+ "loss": 1.0315,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.5602240896358543,
155
+ "grad_norm": 1.5998858213424683,
156
+ "learning_rate": 0.0001966510503636716,
157
+ "loss": 1.038,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.5602240896358543,
162
+ "eval_loss": 1.0578807592391968,
163
+ "eval_runtime": 25.2375,
164
+ "eval_samples_per_second": 14.146,
165
+ "eval_steps_per_second": 1.783,
166
+ "step": 200
167
+ },
168
+ {
169
+ "epoch": 0.5882352941176471,
170
+ "grad_norm": 1.5214334726333618,
171
+ "learning_rate": 0.00019616940604275454,
172
+ "loss": 1.0515,
173
+ "step": 210
174
+ },
175
+ {
176
+ "epoch": 0.6162464985994398,
177
+ "grad_norm": 1.7388910055160522,
178
+ "learning_rate": 0.00019565608570081672,
179
+ "loss": 1.07,
180
+ "step": 220
181
+ },
182
+ {
183
+ "epoch": 0.6442577030812325,
184
+ "grad_norm": 1.5043017864227295,
185
+ "learning_rate": 0.00019511125841393558,
186
+ "loss": 1.0545,
187
+ "step": 230
188
+ },
189
+ {
190
+ "epoch": 0.6722689075630253,
191
+ "grad_norm": 1.5737371444702148,
192
+ "learning_rate": 0.00019453510363586155,
193
+ "loss": 1.0546,
194
+ "step": 240
195
+ },
196
+ {
197
+ "epoch": 0.7002801120448179,
198
+ "grad_norm": 1.5473017692565918,
199
+ "learning_rate": 0.00019392781113891007,
200
+ "loss": 1.0306,
201
+ "step": 250
202
+ },
203
+ {
204
+ "epoch": 0.7282913165266106,
205
+ "grad_norm": 1.6050424575805664,
206
+ "learning_rate": 0.00019328958095145496,
207
+ "loss": 1.0285,
208
+ "step": 260
209
+ },
210
+ {
211
+ "epoch": 0.7563025210084033,
212
+ "grad_norm": 1.4448118209838867,
213
+ "learning_rate": 0.00019262062329204356,
214
+ "loss": 1.0444,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 0.7843137254901961,
219
+ "grad_norm": 1.7479037046432495,
220
+ "learning_rate": 0.0001919211585001554,
221
+ "loss": 1.0341,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 0.8123249299719888,
226
+ "grad_norm": 1.4101306200027466,
227
+ "learning_rate": 0.00019119141696362759,
228
+ "loss": 1.0244,
229
+ "step": 290
230
+ },
231
+ {
232
+ "epoch": 0.8403361344537815,
233
+ "grad_norm": 1.4832830429077148,
234
+ "learning_rate": 0.00019043163904277016,
235
+ "loss": 1.0407,
236
+ "step": 300
237
+ },
238
+ {
239
+ "epoch": 0.8403361344537815,
240
+ "eval_loss": 1.0371309518814087,
241
+ "eval_runtime": 25.2977,
242
+ "eval_samples_per_second": 14.112,
243
+ "eval_steps_per_second": 1.779,
244
+ "step": 300
245
+ },
246
+ {
247
+ "epoch": 0.8683473389355743,
248
+ "grad_norm": 1.4755833148956299,
249
+ "learning_rate": 0.00018964207499119686,
250
+ "loss": 1.0462,
251
+ "step": 310
252
+ },
253
+ {
254
+ "epoch": 0.896358543417367,
255
+ "grad_norm": 1.4913283586502075,
256
+ "learning_rate": 0.0001888229848733972,
257
+ "loss": 1.0474,
258
+ "step": 320
259
+ },
260
+ {
261
+ "epoch": 0.9243697478991597,
262
+ "grad_norm": 1.7317923307418823,
263
+ "learning_rate": 0.0001879746384790773,
264
+ "loss": 1.0447,
265
+ "step": 330
266
+ },
267
+ {
268
+ "epoch": 0.9523809523809523,
269
+ "grad_norm": 1.5321850776672363,
270
+ "learning_rate": 0.0001870973152342971,
271
+ "loss": 1.0225,
272
+ "step": 340
273
+ },
274
+ {
275
+ "epoch": 0.9803921568627451,
276
+ "grad_norm": 1.5264012813568115,
277
+ "learning_rate": 0.00018619130410943386,
278
+ "loss": 1.0527,
279
+ "step": 350
280
+ },
281
+ {
282
+ "epoch": 1.0084033613445378,
283
+ "grad_norm": 1.5999464988708496,
284
+ "learning_rate": 0.0001852569035240018,
285
+ "loss": 0.957,
286
+ "step": 360
287
+ },
288
+ {
289
+ "epoch": 1.0364145658263306,
290
+ "grad_norm": 1.4261401891708374,
291
+ "learning_rate": 0.00018429442124835954,
292
+ "loss": 0.6759,
293
+ "step": 370
294
+ },
295
+ {
296
+ "epoch": 1.0644257703081232,
297
+ "grad_norm": 1.6350306272506714,
298
+ "learning_rate": 0.00018330417430233756,
299
+ "loss": 0.6721,
300
+ "step": 380
301
+ },
302
+ {
303
+ "epoch": 1.092436974789916,
304
+ "grad_norm": 1.4598591327667236,
305
+ "learning_rate": 0.00018228648885081886,
306
+ "loss": 0.6695,
307
+ "step": 390
308
+ },
309
+ {
310
+ "epoch": 1.1204481792717087,
311
+ "grad_norm": 1.6912418603897095,
312
+ "learning_rate": 0.00018124170009630787,
313
+ "loss": 0.6482,
314
+ "step": 400
315
+ },
316
+ {
317
+ "epoch": 1.1204481792717087,
318
+ "eval_loss": 1.0602205991744995,
319
+ "eval_runtime": 25.4667,
320
+ "eval_samples_per_second": 14.018,
321
+ "eval_steps_per_second": 1.767,
322
+ "step": 400
323
+ },
324
+ {
325
+ "epoch": 1.1484593837535013,
326
+ "grad_norm": 1.6862925291061401,
327
+ "learning_rate": 0.0001801701521685223,
328
+ "loss": 0.6921,
329
+ "step": 410
330
+ },
331
+ {
332
+ "epoch": 1.1764705882352942,
333
+ "grad_norm": 1.8303461074829102,
334
+ "learning_rate": 0.00017907219801104458,
335
+ "loss": 0.6826,
336
+ "step": 420
337
+ },
338
+ {
339
+ "epoch": 1.2044817927170868,
340
+ "grad_norm": 2.0269150733947754,
341
+ "learning_rate": 0.00017794819926507014,
342
+ "loss": 0.6765,
343
+ "step": 430
344
+ },
345
+ {
346
+ "epoch": 1.2324929971988796,
347
+ "grad_norm": 1.7937819957733154,
348
+ "learning_rate": 0.00017679852615029123,
349
+ "loss": 0.6884,
350
+ "step": 440
351
+ },
352
+ {
353
+ "epoch": 1.2605042016806722,
354
+ "grad_norm": 1.731198787689209,
355
+ "learning_rate": 0.00017562355734295477,
356
+ "loss": 0.6783,
357
+ "step": 450
358
+ },
359
+ {
360
+ "epoch": 1.2885154061624648,
361
+ "grad_norm": 1.8679757118225098,
362
+ "learning_rate": 0.00017442367985113504,
363
+ "loss": 0.6689,
364
+ "step": 460
365
+ },
366
+ {
367
+ "epoch": 1.3165266106442577,
368
+ "grad_norm": 1.6848735809326172,
369
+ "learning_rate": 0.0001731992888872621,
370
+ "loss": 0.6937,
371
+ "step": 470
372
+ },
373
+ {
374
+ "epoch": 1.3445378151260505,
375
+ "grad_norm": 1.5047993659973145,
376
+ "learning_rate": 0.00017195078773794755,
377
+ "loss": 0.6919,
378
+ "step": 480
379
+ },
380
+ {
381
+ "epoch": 1.3725490196078431,
382
+ "grad_norm": 1.7145918607711792,
383
+ "learning_rate": 0.00017067858763115132,
384
+ "loss": 0.6694,
385
+ "step": 490
386
+ },
387
+ {
388
+ "epoch": 1.4005602240896358,
389
+ "grad_norm": 1.6695507764816284,
390
+ "learning_rate": 0.00016938310760073236,
391
+ "loss": 0.6665,
392
+ "step": 500
393
+ },
394
+ {
395
+ "epoch": 1.4005602240896358,
396
+ "eval_loss": 1.0960317850112915,
397
+ "eval_runtime": 25.678,
398
+ "eval_samples_per_second": 13.903,
399
+ "eval_steps_per_second": 1.752,
400
+ "step": 500
401
+ },
402
+ {
403
+ "epoch": 1.4285714285714286,
404
+ "grad_norm": 1.9052048921585083,
405
+ "learning_rate": 0.00016806477434842836,
406
+ "loss": 0.6711,
407
+ "step": 510
408
+ },
409
+ {
410
+ "epoch": 1.4565826330532212,
411
+ "grad_norm": 1.7252157926559448,
412
+ "learning_rate": 0.00016672402210331,
413
+ "loss": 0.7007,
414
+ "step": 520
415
+ },
416
+ {
417
+ "epoch": 1.484593837535014,
418
+ "grad_norm": 1.7970322370529175,
419
+ "learning_rate": 0.00016536129247875567,
420
+ "loss": 0.7164,
421
+ "step": 530
422
+ },
423
+ {
424
+ "epoch": 1.5126050420168067,
425
+ "grad_norm": 1.7098366022109985,
426
+ "learning_rate": 0.00016397703432699404,
427
+ "loss": 0.7117,
428
+ "step": 540
429
+ },
430
+ {
431
+ "epoch": 1.5406162464985993,
432
+ "grad_norm": 1.853852391242981,
433
+ "learning_rate": 0.00016257170359126237,
434
+ "loss": 0.7123,
435
+ "step": 550
436
+ },
437
+ {
438
+ "epoch": 1.5686274509803921,
439
+ "grad_norm": 2.119030714035034,
440
+ "learning_rate": 0.000161145763155629,
441
+ "loss": 0.6952,
442
+ "step": 560
443
+ },
444
+ {
445
+ "epoch": 1.596638655462185,
446
+ "grad_norm": 1.7882400751113892,
447
+ "learning_rate": 0.00015969968269253013,
448
+ "loss": 0.6957,
449
+ "step": 570
450
+ },
451
+ {
452
+ "epoch": 1.6246498599439776,
453
+ "grad_norm": 2.0184831619262695,
454
+ "learning_rate": 0.00015823393850806998,
455
+ "loss": 0.6972,
456
+ "step": 580
457
+ },
458
+ {
459
+ "epoch": 1.6526610644257702,
460
+ "grad_norm": 1.5946452617645264,
461
+ "learning_rate": 0.00015674901338513672,
462
+ "loss": 0.6888,
463
+ "step": 590
464
+ },
465
+ {
466
+ "epoch": 1.680672268907563,
467
+ "grad_norm": 1.7266991138458252,
468
+ "learning_rate": 0.0001552453964243845,
469
+ "loss": 0.7029,
470
+ "step": 600
471
+ },
472
+ {
473
+ "epoch": 1.680672268907563,
474
+ "eval_loss": 1.0715969800949097,
475
+ "eval_runtime": 25.7893,
476
+ "eval_samples_per_second": 13.843,
477
+ "eval_steps_per_second": 1.745,
478
+ "step": 600
479
+ },
480
+ {
481
+ "epoch": 1.708683473389356,
482
+ "grad_norm": 1.7430881261825562,
483
+ "learning_rate": 0.000153723582883135,
484
+ "loss": 0.7032,
485
+ "step": 610
486
+ },
487
+ {
488
+ "epoch": 1.7366946778711485,
489
+ "grad_norm": 1.88009774684906,
490
+ "learning_rate": 0.00015218407401225057,
491
+ "loss": 0.6902,
492
+ "step": 620
493
+ },
494
+ {
495
+ "epoch": 1.7647058823529411,
496
+ "grad_norm": 1.7102521657943726,
497
+ "learning_rate": 0.000150627376891034,
498
+ "loss": 0.6866,
499
+ "step": 630
500
+ },
501
+ {
502
+ "epoch": 1.7927170868347337,
503
+ "grad_norm": 1.854864239692688,
504
+ "learning_rate": 0.0001490540042602077,
505
+ "loss": 0.6944,
506
+ "step": 640
507
+ },
508
+ {
509
+ "epoch": 1.8207282913165266,
510
+ "grad_norm": 1.61717689037323,
511
+ "learning_rate": 0.0001474644743530288,
512
+ "loss": 0.7043,
513
+ "step": 650
514
+ },
515
+ {
516
+ "epoch": 1.8487394957983194,
517
+ "grad_norm": 1.7749794721603394,
518
+ "learning_rate": 0.00014585931072459474,
519
+ "loss": 0.7008,
520
+ "step": 660
521
+ },
522
+ {
523
+ "epoch": 1.876750700280112,
524
+ "grad_norm": 1.7132004499435425,
525
+ "learning_rate": 0.00014423904207939615,
526
+ "loss": 0.7167,
527
+ "step": 670
528
+ },
529
+ {
530
+ "epoch": 1.9047619047619047,
531
+ "grad_norm": 1.784264326095581,
532
+ "learning_rate": 0.0001426042020971736,
533
+ "loss": 0.7203,
534
+ "step": 680
535
+ },
536
+ {
537
+ "epoch": 1.9327731092436975,
538
+ "grad_norm": 1.8548870086669922,
539
+ "learning_rate": 0.00014095532925713553,
540
+ "loss": 0.7198,
541
+ "step": 690
542
+ },
543
+ {
544
+ "epoch": 1.9607843137254903,
545
+ "grad_norm": 1.4964855909347534,
546
+ "learning_rate": 0.00013929296666059544,
547
+ "loss": 0.7053,
548
+ "step": 700
549
+ },
550
+ {
551
+ "epoch": 1.9607843137254903,
552
+ "eval_loss": 1.0620192289352417,
553
+ "eval_runtime": 25.7481,
554
+ "eval_samples_per_second": 13.865,
555
+ "eval_steps_per_second": 1.748,
556
+ "step": 700
557
+ },
558
+ {
559
+ "epoch": 1.988795518207283,
560
+ "grad_norm": 1.7073538303375244,
561
+ "learning_rate": 0.00013761766185208663,
562
+ "loss": 0.7338,
563
+ "step": 710
564
+ },
565
+ {
566
+ "epoch": 2.0168067226890756,
567
+ "grad_norm": 1.4009623527526855,
568
+ "learning_rate": 0.00013592996663901354,
569
+ "loss": 0.4904,
570
+ "step": 720
571
+ },
572
+ {
573
+ "epoch": 2.044817927170868,
574
+ "grad_norm": 1.6378936767578125,
575
+ "learning_rate": 0.00013423043690989874,
576
+ "loss": 0.3822,
577
+ "step": 730
578
+ },
579
+ {
580
+ "epoch": 2.0728291316526612,
581
+ "grad_norm": 1.8590551614761353,
582
+ "learning_rate": 0.00013251963245128603,
583
+ "loss": 0.3615,
584
+ "step": 740
585
+ },
586
+ {
587
+ "epoch": 2.100840336134454,
588
+ "grad_norm": 1.7983025312423706,
589
+ "learning_rate": 0.00013079811676335935,
590
+ "loss": 0.356,
591
+ "step": 750
592
+ },
593
+ {
594
+ "epoch": 2.1288515406162465,
595
+ "grad_norm": 2.0292270183563232,
596
+ "learning_rate": 0.00012906645687433857,
597
+ "loss": 0.3672,
598
+ "step": 760
599
+ },
600
+ {
601
+ "epoch": 2.156862745098039,
602
+ "grad_norm": 1.774778127670288,
603
+ "learning_rate": 0.00012732522315371338,
604
+ "loss": 0.3758,
605
+ "step": 770
606
+ },
607
+ {
608
+ "epoch": 2.184873949579832,
609
+ "grad_norm": 1.9023045301437378,
610
+ "learning_rate": 0.0001255749891243764,
611
+ "loss": 0.3674,
612
+ "step": 780
613
+ },
614
+ {
615
+ "epoch": 2.212885154061625,
616
+ "grad_norm": 1.4395744800567627,
617
+ "learning_rate": 0.00012381633127371783,
618
+ "loss": 0.3723,
619
+ "step": 790
620
+ },
621
+ {
622
+ "epoch": 2.2408963585434174,
623
+ "grad_norm": 1.6131172180175781,
624
+ "learning_rate": 0.00012204982886374348,
625
+ "loss": 0.3699,
626
+ "step": 800
627
+ },
628
+ {
629
+ "epoch": 2.2408963585434174,
630
+ "eval_loss": 1.2616853713989258,
631
+ "eval_runtime": 26.08,
632
+ "eval_samples_per_second": 13.689,
633
+ "eval_steps_per_second": 1.725,
634
+ "step": 800
635
+ },
636
+ {
637
+ "epoch": 2.26890756302521,
638
+ "grad_norm": 1.782162070274353,
639
+ "learning_rate": 0.00012027606374027915,
640
+ "loss": 0.3834,
641
+ "step": 810
642
+ },
643
+ {
644
+ "epoch": 2.2969187675070026,
645
+ "grad_norm": 1.7493244409561157,
646
+ "learning_rate": 0.00011849562014132367,
647
+ "loss": 0.3722,
648
+ "step": 820
649
+ },
650
+ {
651
+ "epoch": 2.3249299719887957,
652
+ "grad_norm": 1.5457279682159424,
653
+ "learning_rate": 0.00011670908450461421,
654
+ "loss": 0.3836,
655
+ "step": 830
656
+ },
657
+ {
658
+ "epoch": 2.3529411764705883,
659
+ "grad_norm": 1.914865255355835,
660
+ "learning_rate": 0.00011491704527446705,
661
+ "loss": 0.3929,
662
+ "step": 840
663
+ },
664
+ {
665
+ "epoch": 2.380952380952381,
666
+ "grad_norm": 1.9673585891723633,
667
+ "learning_rate": 0.00011312009270795729,
668
+ "loss": 0.3736,
669
+ "step": 850
670
+ },
671
+ {
672
+ "epoch": 2.4089635854341735,
673
+ "grad_norm": 1.508853554725647,
674
+ "learning_rate": 0.00011131881868050152,
675
+ "loss": 0.3818,
676
+ "step": 860
677
+ },
678
+ {
679
+ "epoch": 2.4369747899159666,
680
+ "grad_norm": 1.633651852607727,
681
+ "learning_rate": 0.00010951381649090766,
682
+ "loss": 0.3706,
683
+ "step": 870
684
+ },
685
+ {
686
+ "epoch": 2.4649859943977592,
687
+ "grad_norm": 1.7917840480804443,
688
+ "learning_rate": 0.00010770568066595556,
689
+ "loss": 0.372,
690
+ "step": 880
691
+ },
692
+ {
693
+ "epoch": 2.492997198879552,
694
+ "grad_norm": 1.7907308340072632,
695
+ "learning_rate": 0.00010589500676457358,
696
+ "loss": 0.392,
697
+ "step": 890
698
+ },
699
+ {
700
+ "epoch": 2.5210084033613445,
701
+ "grad_norm": 1.6397713422775269,
702
+ "learning_rate": 0.00010408239118167482,
703
+ "loss": 0.3866,
704
+ "step": 900
705
+ },
706
+ {
707
+ "epoch": 2.5210084033613445,
708
+ "eval_loss": 1.254116177558899,
709
+ "eval_runtime": 26.2156,
710
+ "eval_samples_per_second": 13.618,
711
+ "eval_steps_per_second": 1.717,
712
+ "step": 900
713
+ },
714
+ {
715
+ "epoch": 2.549019607843137,
716
+ "grad_norm": 1.613955020904541,
717
+ "learning_rate": 0.00010226843095171827,
718
+ "loss": 0.3973,
719
+ "step": 910
720
+ },
721
+ {
722
+ "epoch": 2.5770308123249297,
723
+ "grad_norm": 1.7094041109085083,
724
+ "learning_rate": 0.00010045372355205915,
725
+ "loss": 0.3697,
726
+ "step": 920
727
+ },
728
+ {
729
+ "epoch": 2.6050420168067228,
730
+ "grad_norm": 1.6542613506317139,
731
+ "learning_rate": 9.86388667061534e-05,
732
+ "loss": 0.3773,
733
+ "step": 930
734
+ },
735
+ {
736
+ "epoch": 2.6330532212885154,
737
+ "grad_norm": 1.57974374294281,
738
+ "learning_rate": 9.68244581866812e-05,
739
+ "loss": 0.381,
740
+ "step": 940
741
+ },
742
+ {
743
+ "epoch": 2.661064425770308,
744
+ "grad_norm": 1.8591638803482056,
745
+ "learning_rate": 9.501109561865415e-05,
746
+ "loss": 0.3722,
747
+ "step": 950
748
+ },
749
+ {
750
+ "epoch": 2.689075630252101,
751
+ "grad_norm": 1.8032310009002686,
752
+ "learning_rate": 9.319937628257121e-05,
753
+ "loss": 0.3812,
754
+ "step": 960
755
+ },
756
+ {
757
+ "epoch": 2.7170868347338937,
758
+ "grad_norm": 1.6205816268920898,
759
+ "learning_rate": 9.1389896917688e-05,
760
+ "loss": 0.358,
761
+ "step": 970
762
+ },
763
+ {
764
+ "epoch": 2.7450980392156863,
765
+ "grad_norm": 1.5483547449111938,
766
+ "learning_rate": 8.958325352546433e-05,
767
+ "loss": 0.3829,
768
+ "step": 980
769
+ },
770
+ {
771
+ "epoch": 2.773109243697479,
772
+ "grad_norm": 1.7492879629135132,
773
+ "learning_rate": 8.778004117325508e-05,
774
+ "loss": 0.3832,
775
+ "step": 990
776
+ },
777
+ {
778
+ "epoch": 2.8011204481792715,
779
+ "grad_norm": 1.752061128616333,
780
+ "learning_rate": 8.598085379830831e-05,
781
+ "loss": 0.3708,
782
+ "step": 1000
783
+ },
784
+ {
785
+ "epoch": 2.8011204481792715,
786
+ "eval_loss": 1.242302417755127,
787
+ "eval_runtime": 26.0048,
788
+ "eval_samples_per_second": 13.728,
789
+ "eval_steps_per_second": 1.73,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 2.8291316526610646,
794
+ "grad_norm": 1.7680628299713135,
795
+ "learning_rate": 8.418628401213588e-05,
796
+ "loss": 0.3916,
797
+ "step": 1010
798
+ },
799
+ {
800
+ "epoch": 2.857142857142857,
801
+ "grad_norm": 2.042778253555298,
802
+ "learning_rate": 8.239692290532066e-05,
803
+ "loss": 0.375,
804
+ "step": 1020
805
+ },
806
+ {
807
+ "epoch": 2.88515406162465,
808
+ "grad_norm": 1.5636996030807495,
809
+ "learning_rate": 8.061335985282463e-05,
810
+ "loss": 0.371,
811
+ "step": 1030
812
+ },
813
+ {
814
+ "epoch": 2.9131652661064424,
815
+ "grad_norm": 2.1579694747924805,
816
+ "learning_rate": 7.883618231986223e-05,
817
+ "loss": 0.3728,
818
+ "step": 1040
819
+ },
820
+ {
821
+ "epoch": 2.9411764705882355,
822
+ "grad_norm": 1.8203998804092407,
823
+ "learning_rate": 7.706597566840258e-05,
824
+ "loss": 0.3752,
825
+ "step": 1050
826
+ },
827
+ {
828
+ "epoch": 2.969187675070028,
829
+ "grad_norm": 1.7312395572662354,
830
+ "learning_rate": 7.53033229643646e-05,
831
+ "loss": 0.3789,
832
+ "step": 1060
833
+ },
834
+ {
835
+ "epoch": 2.9971988795518207,
836
+ "grad_norm": 1.616680383682251,
837
+ "learning_rate": 7.354880478556818e-05,
838
+ "loss": 0.3682,
839
+ "step": 1070
840
+ },
841
+ {
842
+ "epoch": 3.0252100840336134,
843
+ "grad_norm": 1.6187553405761719,
844
+ "learning_rate": 7.18029990305052e-05,
845
+ "loss": 0.2418,
846
+ "step": 1080
847
+ },
848
+ {
849
+ "epoch": 3.053221288515406,
850
+ "grad_norm": 1.2909393310546875,
851
+ "learning_rate": 7.006648072799272e-05,
852
+ "loss": 0.2199,
853
+ "step": 1090
854
+ },
855
+ {
856
+ "epoch": 3.081232492997199,
857
+ "grad_norm": 1.4550460577011108,
858
+ "learning_rate": 6.833982184777165e-05,
859
+ "loss": 0.2417,
860
+ "step": 1100
861
+ },
862
+ {
863
+ "epoch": 3.081232492997199,
864
+ "eval_loss": 1.3649441003799438,
865
+ "eval_runtime": 25.9266,
866
+ "eval_samples_per_second": 13.77,
867
+ "eval_steps_per_second": 1.736,
868
+ "step": 1100
869
+ },
870
+ {
871
+ "epoch": 3.1092436974789917,
872
+ "grad_norm": 1.4254740476608276,
873
+ "learning_rate": 6.662359111211268e-05,
874
+ "loss": 0.2217,
875
+ "step": 1110
876
+ },
877
+ {
878
+ "epoch": 3.1372549019607843,
879
+ "grad_norm": 1.446631908416748,
880
+ "learning_rate": 6.491835380849238e-05,
881
+ "loss": 0.2312,
882
+ "step": 1120
883
+ },
884
+ {
885
+ "epoch": 3.165266106442577,
886
+ "grad_norm": 1.636902093887329,
887
+ "learning_rate": 6.322467160340011e-05,
888
+ "loss": 0.2215,
889
+ "step": 1130
890
+ },
891
+ {
892
+ "epoch": 3.19327731092437,
893
+ "grad_norm": 1.3956204652786255,
894
+ "learning_rate": 6.154310235733785e-05,
895
+ "loss": 0.2286,
896
+ "step": 1140
897
+ },
898
+ {
899
+ "epoch": 3.2212885154061626,
900
+ "grad_norm": 1.1625617742538452,
901
+ "learning_rate": 5.9874199941073636e-05,
902
+ "loss": 0.2402,
903
+ "step": 1150
904
+ },
905
+ {
906
+ "epoch": 3.249299719887955,
907
+ "grad_norm": 1.3581691980361938,
908
+ "learning_rate": 5.821851405320903e-05,
909
+ "loss": 0.2313,
910
+ "step": 1160
911
+ },
912
+ {
913
+ "epoch": 3.277310924369748,
914
+ "grad_norm": 1.2999801635742188,
915
+ "learning_rate": 5.657659003912104e-05,
916
+ "loss": 0.2251,
917
+ "step": 1170
918
+ },
919
+ {
920
+ "epoch": 3.3053221288515404,
921
+ "grad_norm": 1.4706475734710693,
922
+ "learning_rate": 5.494896871133743e-05,
923
+ "loss": 0.2213,
924
+ "step": 1180
925
+ },
926
+ {
927
+ "epoch": 3.3333333333333335,
928
+ "grad_norm": 1.4545742273330688,
929
+ "learning_rate": 5.3336186171405764e-05,
930
+ "loss": 0.2333,
931
+ "step": 1190
932
+ },
933
+ {
934
+ "epoch": 3.361344537815126,
935
+ "grad_norm": 1.2404979467391968,
936
+ "learning_rate": 5.173877363331311e-05,
937
+ "loss": 0.2211,
938
+ "step": 1200
939
+ },
940
+ {
941
+ "epoch": 3.361344537815126,
942
+ "eval_loss": 1.4273622035980225,
943
+ "eval_runtime": 26.0864,
944
+ "eval_samples_per_second": 13.685,
945
+ "eval_steps_per_second": 1.725,
946
+ "step": 1200
947
+ },
948
+ {
949
+ "epoch": 3.3893557422969187,
950
+ "grad_norm": 1.3979793787002563,
951
+ "learning_rate": 5.0157257248516896e-05,
952
+ "loss": 0.2291,
953
+ "step": 1210
954
+ },
955
+ {
956
+ "epoch": 3.4173669467787113,
957
+ "grad_norm": 1.7199064493179321,
958
+ "learning_rate": 4.859215793264198e-05,
959
+ "loss": 0.229,
960
+ "step": 1220
961
+ },
962
+ {
963
+ "epoch": 3.4453781512605044,
964
+ "grad_norm": 1.4441235065460205,
965
+ "learning_rate": 4.7043991193903244e-05,
966
+ "loss": 0.2329,
967
+ "step": 1230
968
+ },
969
+ {
970
+ "epoch": 3.473389355742297,
971
+ "grad_norm": 1.363365650177002,
972
+ "learning_rate": 4.5513266963308654e-05,
973
+ "loss": 0.2308,
974
+ "step": 1240
975
+ },
976
+ {
977
+ "epoch": 3.5014005602240896,
978
+ "grad_norm": 1.415418267250061,
979
+ "learning_rate": 4.400048942669962e-05,
980
+ "loss": 0.2267,
981
+ "step": 1250
982
+ },
983
+ {
984
+ "epoch": 3.5294117647058822,
985
+ "grad_norm": 1.3714072704315186,
986
+ "learning_rate": 4.25061568586836e-05,
987
+ "loss": 0.2227,
988
+ "step": 1260
989
+ },
990
+ {
991
+ "epoch": 3.557422969187675,
992
+ "grad_norm": 1.2111363410949707,
993
+ "learning_rate": 4.10307614585139e-05,
994
+ "loss": 0.2322,
995
+ "step": 1270
996
+ },
997
+ {
998
+ "epoch": 3.585434173669468,
999
+ "grad_norm": 1.3968279361724854,
1000
+ "learning_rate": 3.957478918797013e-05,
1001
+ "loss": 0.225,
1002
+ "step": 1280
1003
+ },
1004
+ {
1005
+ "epoch": 3.6134453781512605,
1006
+ "grad_norm": 1.4936352968215942,
1007
+ "learning_rate": 3.8138719611293595e-05,
1008
+ "loss": 0.2191,
1009
+ "step": 1290
1010
+ },
1011
+ {
1012
+ "epoch": 3.641456582633053,
1013
+ "grad_norm": 1.3193854093551636,
1014
+ "learning_rate": 3.6723025737229744e-05,
1015
+ "loss": 0.2251,
1016
+ "step": 1300
1017
+ },
1018
+ {
1019
+ "epoch": 3.641456582633053,
1020
+ "eval_loss": 1.4654343128204346,
1021
+ "eval_runtime": 26.0893,
1022
+ "eval_samples_per_second": 13.684,
1023
+ "eval_steps_per_second": 1.725,
1024
+ "step": 1300
1025
+ },
1026
+ {
1027
+ "epoch": 3.669467787114846,
1028
+ "grad_norm": 1.2740771770477295,
1029
+ "learning_rate": 3.532817386322964e-05,
1030
+ "loss": 0.2235,
1031
+ "step": 1310
1032
+ },
1033
+ {
1034
+ "epoch": 3.697478991596639,
1035
+ "grad_norm": 1.1999931335449219,
1036
+ "learning_rate": 3.395462342186244e-05,
1037
+ "loss": 0.2244,
1038
+ "step": 1320
1039
+ },
1040
+ {
1041
+ "epoch": 3.7254901960784315,
1042
+ "grad_norm": 1.4531569480895996,
1043
+ "learning_rate": 3.2602826829488296e-05,
1044
+ "loss": 0.2218,
1045
+ "step": 1330
1046
+ },
1047
+ {
1048
+ "epoch": 3.753501400560224,
1049
+ "grad_norm": 1.5755056142807007,
1050
+ "learning_rate": 3.127322933724317e-05,
1051
+ "loss": 0.2331,
1052
+ "step": 1340
1053
+ },
1054
+ {
1055
+ "epoch": 3.7815126050420167,
1056
+ "grad_norm": 1.2473876476287842,
1057
+ "learning_rate": 2.9966268884382808e-05,
1058
+ "loss": 0.2171,
1059
+ "step": 1350
1060
+ },
1061
+ {
1062
+ "epoch": 3.8095238095238093,
1063
+ "grad_norm": 1.3828644752502441,
1064
+ "learning_rate": 2.8682375954035756e-05,
1065
+ "loss": 0.2269,
1066
+ "step": 1360
1067
+ },
1068
+ {
1069
+ "epoch": 3.8375350140056024,
1070
+ "grad_norm": 1.473433494567871,
1071
+ "learning_rate": 2.7421973431411775e-05,
1072
+ "loss": 0.2121,
1073
+ "step": 1370
1074
+ },
1075
+ {
1076
+ "epoch": 3.865546218487395,
1077
+ "grad_norm": 1.1347614526748657,
1078
+ "learning_rate": 2.618547646451317e-05,
1079
+ "loss": 0.2187,
1080
+ "step": 1380
1081
+ },
1082
+ {
1083
+ "epoch": 3.8935574229691876,
1084
+ "grad_norm": 1.4064631462097168,
1085
+ "learning_rate": 2.497329232739438e-05,
1086
+ "loss": 0.2243,
1087
+ "step": 1390
1088
+ },
1089
+ {
1090
+ "epoch": 3.9215686274509802,
1091
+ "grad_norm": 1.394667148590088,
1092
+ "learning_rate": 2.378582028601529e-05,
1093
+ "loss": 0.222,
1094
+ "step": 1400
1095
+ },
1096
+ {
1097
+ "epoch": 3.9215686274509802,
1098
+ "eval_loss": 1.474218487739563,
1099
+ "eval_runtime": 26.0178,
1100
+ "eval_samples_per_second": 13.721,
1101
+ "eval_steps_per_second": 1.73,
1102
+ "step": 1400
1103
+ }
1104
+ ],
1105
+ "logging_steps": 10,
1106
+ "max_steps": 1785,
1107
+ "num_input_tokens_seen": 0,
1108
+ "num_train_epochs": 5,
1109
+ "save_steps": 100,
1110
+ "stateful_callbacks": {
1111
+ "EarlyStoppingCallback": {
1112
+ "args": {
1113
+ "early_stopping_patience": 12,
1114
+ "early_stopping_threshold": 0.0
1115
+ },
1116
+ "attributes": {
1117
+ "early_stopping_patience_counter": 11
1118
+ }
1119
+ },
1120
+ "TrainerControl": {
1121
+ "args": {
1122
+ "should_epoch_stop": false,
1123
+ "should_evaluate": false,
1124
+ "should_log": false,
1125
+ "should_save": true,
1126
+ "should_training_stop": false
1127
+ },
1128
+ "attributes": {}
1129
+ }
1130
+ },
1131
+ "total_flos": 9.429376435853722e+16,
1132
+ "train_batch_size": 8,
1133
+ "trial_name": null,
1134
+ "trial_params": null
1135
+ }
checkpoint-1400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
3
+ size 5368
checkpoint-1500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
checkpoint-1500/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "o_proj",
27
+ "up_proj",
28
+ "gate_proj",
29
+ "q_proj",
30
+ "k_proj",
31
+ "v_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-1500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e727b7c8b56abbc5942fdfde2dbc5e375b0db508fc8ac7b2cad727cb7b17880f
3
+ size 167832240
checkpoint-1500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9e8d62a362ad67f9bc9793c5e8cf130f5a0e8bce9cead1a377b21c539246b14
3
+ size 335922386
checkpoint-1500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:161cf2545ae205d8a7f1a7656c017af30e1e56e5150f9c795f10a302fbc0943e
3
+ size 14244
checkpoint-1500/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:495cba9ae11960da6b4497a76242a85485dd5b067748b8f92d20b7738cee245a
3
+ size 988
checkpoint-1500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d4e03022bce0c02012e522e66435423a73932b4197057e2544fe60d66fe1eff
3
+ size 1064
checkpoint-1500/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1500/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1500/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
checkpoint-1500/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1500/trainer_state.json ADDED
@@ -0,0 +1,1213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 300,
3
+ "best_metric": 1.0371309518814087,
4
+ "best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
5
+ "epoch": 4.201680672268908,
6
+ "eval_steps": 100,
7
+ "global_step": 1500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.028011204481792718,
14
+ "grad_norm": 5.871906757354736,
15
+ "learning_rate": 2.962962962962963e-05,
16
+ "loss": 2.3552,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.056022408963585436,
21
+ "grad_norm": 2.389425039291382,
22
+ "learning_rate": 6.296296296296296e-05,
23
+ "loss": 1.7094,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.08403361344537816,
28
+ "grad_norm": 2.160494804382324,
29
+ "learning_rate": 0.0001,
30
+ "loss": 1.4552,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.11204481792717087,
35
+ "grad_norm": 2.3963708877563477,
36
+ "learning_rate": 0.00013703703703703705,
37
+ "loss": 1.2928,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.1400560224089636,
42
+ "grad_norm": 2.192927598953247,
43
+ "learning_rate": 0.00017407407407407408,
44
+ "loss": 1.1542,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.16806722689075632,
49
+ "grad_norm": 1.6872493028640747,
50
+ "learning_rate": 0.00019999851776504612,
51
+ "loss": 1.1238,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.19607843137254902,
56
+ "grad_norm": 1.8370214700698853,
57
+ "learning_rate": 0.00019997216814377874,
58
+ "loss": 1.1234,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.22408963585434175,
63
+ "grad_norm": 1.7727515697479248,
64
+ "learning_rate": 0.0001999128899579094,
65
+ "loss": 1.1143,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.25210084033613445,
70
+ "grad_norm": 1.5649667978286743,
71
+ "learning_rate": 0.00019982070273232797,
72
+ "loss": 1.112,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.2801120448179272,
77
+ "grad_norm": 1.5107511281967163,
78
+ "learning_rate": 0.0001996956368314155,
79
+ "loss": 1.1154,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.2801120448179272,
84
+ "eval_loss": 1.0852699279785156,
85
+ "eval_runtime": 25.3067,
86
+ "eval_samples_per_second": 14.107,
87
+ "eval_steps_per_second": 1.778,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.3081232492997199,
92
+ "grad_norm": 1.5419102907180786,
93
+ "learning_rate": 0.00019953773344904302,
94
+ "loss": 1.0686,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.33613445378151263,
99
+ "grad_norm": 1.5007447004318237,
100
+ "learning_rate": 0.00019934704459500312,
101
+ "loss": 1.0849,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.3641456582633053,
106
+ "grad_norm": 1.67251455783844,
107
+ "learning_rate": 0.000199123633077879,
108
+ "loss": 1.0793,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.39215686274509803,
113
+ "grad_norm": 1.4843816757202148,
114
+ "learning_rate": 0.00019886757248435704,
115
+ "loss": 1.0122,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.42016806722689076,
120
+ "grad_norm": 1.4718296527862549,
121
+ "learning_rate": 0.00019857894715498873,
122
+ "loss": 1.0423,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.4481792717086835,
127
+ "grad_norm": 1.4704537391662598,
128
+ "learning_rate": 0.00019825785215641086,
129
+ "loss": 1.0524,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.47619047619047616,
134
+ "grad_norm": 1.4213836193084717,
135
+ "learning_rate": 0.0001979043932500329,
136
+ "loss": 1.0373,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.5042016806722689,
141
+ "grad_norm": 1.579439401626587,
142
+ "learning_rate": 0.00019751868685720134,
143
+ "loss": 1.0539,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.5322128851540616,
148
+ "grad_norm": 1.4878956079483032,
149
+ "learning_rate": 0.00019710086002085335,
150
+ "loss": 1.0315,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.5602240896358543,
155
+ "grad_norm": 1.5998858213424683,
156
+ "learning_rate": 0.0001966510503636716,
157
+ "loss": 1.038,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.5602240896358543,
162
+ "eval_loss": 1.0578807592391968,
163
+ "eval_runtime": 25.2375,
164
+ "eval_samples_per_second": 14.146,
165
+ "eval_steps_per_second": 1.783,
166
+ "step": 200
167
+ },
168
+ {
169
+ "epoch": 0.5882352941176471,
170
+ "grad_norm": 1.5214334726333618,
171
+ "learning_rate": 0.00019616940604275454,
172
+ "loss": 1.0515,
173
+ "step": 210
174
+ },
175
+ {
176
+ "epoch": 0.6162464985994398,
177
+ "grad_norm": 1.7388910055160522,
178
+ "learning_rate": 0.00019565608570081672,
179
+ "loss": 1.07,
180
+ "step": 220
181
+ },
182
+ {
183
+ "epoch": 0.6442577030812325,
184
+ "grad_norm": 1.5043017864227295,
185
+ "learning_rate": 0.00019511125841393558,
186
+ "loss": 1.0545,
187
+ "step": 230
188
+ },
189
+ {
190
+ "epoch": 0.6722689075630253,
191
+ "grad_norm": 1.5737371444702148,
192
+ "learning_rate": 0.00019453510363586155,
193
+ "loss": 1.0546,
194
+ "step": 240
195
+ },
196
+ {
197
+ "epoch": 0.7002801120448179,
198
+ "grad_norm": 1.5473017692565918,
199
+ "learning_rate": 0.00019392781113891007,
200
+ "loss": 1.0306,
201
+ "step": 250
202
+ },
203
+ {
204
+ "epoch": 0.7282913165266106,
205
+ "grad_norm": 1.6050424575805664,
206
+ "learning_rate": 0.00019328958095145496,
207
+ "loss": 1.0285,
208
+ "step": 260
209
+ },
210
+ {
211
+ "epoch": 0.7563025210084033,
212
+ "grad_norm": 1.4448118209838867,
213
+ "learning_rate": 0.00019262062329204356,
214
+ "loss": 1.0444,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 0.7843137254901961,
219
+ "grad_norm": 1.7479037046432495,
220
+ "learning_rate": 0.0001919211585001554,
221
+ "loss": 1.0341,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 0.8123249299719888,
226
+ "grad_norm": 1.4101306200027466,
227
+ "learning_rate": 0.00019119141696362759,
228
+ "loss": 1.0244,
229
+ "step": 290
230
+ },
231
+ {
232
+ "epoch": 0.8403361344537815,
233
+ "grad_norm": 1.4832830429077148,
234
+ "learning_rate": 0.00019043163904277016,
235
+ "loss": 1.0407,
236
+ "step": 300
237
+ },
238
+ {
239
+ "epoch": 0.8403361344537815,
240
+ "eval_loss": 1.0371309518814087,
241
+ "eval_runtime": 25.2977,
242
+ "eval_samples_per_second": 14.112,
243
+ "eval_steps_per_second": 1.779,
244
+ "step": 300
245
+ },
246
+ {
247
+ "epoch": 0.8683473389355743,
248
+ "grad_norm": 1.4755833148956299,
249
+ "learning_rate": 0.00018964207499119686,
250
+ "loss": 1.0462,
251
+ "step": 310
252
+ },
253
+ {
254
+ "epoch": 0.896358543417367,
255
+ "grad_norm": 1.4913283586502075,
256
+ "learning_rate": 0.0001888229848733972,
257
+ "loss": 1.0474,
258
+ "step": 320
259
+ },
260
+ {
261
+ "epoch": 0.9243697478991597,
262
+ "grad_norm": 1.7317923307418823,
263
+ "learning_rate": 0.0001879746384790773,
264
+ "loss": 1.0447,
265
+ "step": 330
266
+ },
267
+ {
268
+ "epoch": 0.9523809523809523,
269
+ "grad_norm": 1.5321850776672363,
270
+ "learning_rate": 0.0001870973152342971,
271
+ "loss": 1.0225,
272
+ "step": 340
273
+ },
274
+ {
275
+ "epoch": 0.9803921568627451,
276
+ "grad_norm": 1.5264012813568115,
277
+ "learning_rate": 0.00018619130410943386,
278
+ "loss": 1.0527,
279
+ "step": 350
280
+ },
281
+ {
282
+ "epoch": 1.0084033613445378,
283
+ "grad_norm": 1.5999464988708496,
284
+ "learning_rate": 0.0001852569035240018,
285
+ "loss": 0.957,
286
+ "step": 360
287
+ },
288
+ {
289
+ "epoch": 1.0364145658263306,
290
+ "grad_norm": 1.4261401891708374,
291
+ "learning_rate": 0.00018429442124835954,
292
+ "loss": 0.6759,
293
+ "step": 370
294
+ },
295
+ {
296
+ "epoch": 1.0644257703081232,
297
+ "grad_norm": 1.6350306272506714,
298
+ "learning_rate": 0.00018330417430233756,
299
+ "loss": 0.6721,
300
+ "step": 380
301
+ },
302
+ {
303
+ "epoch": 1.092436974789916,
304
+ "grad_norm": 1.4598591327667236,
305
+ "learning_rate": 0.00018228648885081886,
306
+ "loss": 0.6695,
307
+ "step": 390
308
+ },
309
+ {
310
+ "epoch": 1.1204481792717087,
311
+ "grad_norm": 1.6912418603897095,
312
+ "learning_rate": 0.00018124170009630787,
313
+ "loss": 0.6482,
314
+ "step": 400
315
+ },
316
+ {
317
+ "epoch": 1.1204481792717087,
318
+ "eval_loss": 1.0602205991744995,
319
+ "eval_runtime": 25.4667,
320
+ "eval_samples_per_second": 14.018,
321
+ "eval_steps_per_second": 1.767,
322
+ "step": 400
323
+ },
324
+ {
325
+ "epoch": 1.1484593837535013,
326
+ "grad_norm": 1.6862925291061401,
327
+ "learning_rate": 0.0001801701521685223,
328
+ "loss": 0.6921,
329
+ "step": 410
330
+ },
331
+ {
332
+ "epoch": 1.1764705882352942,
333
+ "grad_norm": 1.8303461074829102,
334
+ "learning_rate": 0.00017907219801104458,
335
+ "loss": 0.6826,
336
+ "step": 420
337
+ },
338
+ {
339
+ "epoch": 1.2044817927170868,
340
+ "grad_norm": 2.0269150733947754,
341
+ "learning_rate": 0.00017794819926507014,
342
+ "loss": 0.6765,
343
+ "step": 430
344
+ },
345
+ {
346
+ "epoch": 1.2324929971988796,
347
+ "grad_norm": 1.7937819957733154,
348
+ "learning_rate": 0.00017679852615029123,
349
+ "loss": 0.6884,
350
+ "step": 440
351
+ },
352
+ {
353
+ "epoch": 1.2605042016806722,
354
+ "grad_norm": 1.731198787689209,
355
+ "learning_rate": 0.00017562355734295477,
356
+ "loss": 0.6783,
357
+ "step": 450
358
+ },
359
+ {
360
+ "epoch": 1.2885154061624648,
361
+ "grad_norm": 1.8679757118225098,
362
+ "learning_rate": 0.00017442367985113504,
363
+ "loss": 0.6689,
364
+ "step": 460
365
+ },
366
+ {
367
+ "epoch": 1.3165266106442577,
368
+ "grad_norm": 1.6848735809326172,
369
+ "learning_rate": 0.0001731992888872621,
370
+ "loss": 0.6937,
371
+ "step": 470
372
+ },
373
+ {
374
+ "epoch": 1.3445378151260505,
375
+ "grad_norm": 1.5047993659973145,
376
+ "learning_rate": 0.00017195078773794755,
377
+ "loss": 0.6919,
378
+ "step": 480
379
+ },
380
+ {
381
+ "epoch": 1.3725490196078431,
382
+ "grad_norm": 1.7145918607711792,
383
+ "learning_rate": 0.00017067858763115132,
384
+ "loss": 0.6694,
385
+ "step": 490
386
+ },
387
+ {
388
+ "epoch": 1.4005602240896358,
389
+ "grad_norm": 1.6695507764816284,
390
+ "learning_rate": 0.00016938310760073236,
391
+ "loss": 0.6665,
392
+ "step": 500
393
+ },
394
+ {
395
+ "epoch": 1.4005602240896358,
396
+ "eval_loss": 1.0960317850112915,
397
+ "eval_runtime": 25.678,
398
+ "eval_samples_per_second": 13.903,
399
+ "eval_steps_per_second": 1.752,
400
+ "step": 500
401
+ },
402
+ {
403
+ "epoch": 1.4285714285714286,
404
+ "grad_norm": 1.9052048921585083,
405
+ "learning_rate": 0.00016806477434842836,
406
+ "loss": 0.6711,
407
+ "step": 510
408
+ },
409
+ {
410
+ "epoch": 1.4565826330532212,
411
+ "grad_norm": 1.7252157926559448,
412
+ "learning_rate": 0.00016672402210331,
413
+ "loss": 0.7007,
414
+ "step": 520
415
+ },
416
+ {
417
+ "epoch": 1.484593837535014,
418
+ "grad_norm": 1.7970322370529175,
419
+ "learning_rate": 0.00016536129247875567,
420
+ "loss": 0.7164,
421
+ "step": 530
422
+ },
423
+ {
424
+ "epoch": 1.5126050420168067,
425
+ "grad_norm": 1.7098366022109985,
426
+ "learning_rate": 0.00016397703432699404,
427
+ "loss": 0.7117,
428
+ "step": 540
429
+ },
430
+ {
431
+ "epoch": 1.5406162464985993,
432
+ "grad_norm": 1.853852391242981,
433
+ "learning_rate": 0.00016257170359126237,
434
+ "loss": 0.7123,
435
+ "step": 550
436
+ },
437
+ {
438
+ "epoch": 1.5686274509803921,
439
+ "grad_norm": 2.119030714035034,
440
+ "learning_rate": 0.000161145763155629,
441
+ "loss": 0.6952,
442
+ "step": 560
443
+ },
444
+ {
445
+ "epoch": 1.596638655462185,
446
+ "grad_norm": 1.7882400751113892,
447
+ "learning_rate": 0.00015969968269253013,
448
+ "loss": 0.6957,
449
+ "step": 570
450
+ },
451
+ {
452
+ "epoch": 1.6246498599439776,
453
+ "grad_norm": 2.0184831619262695,
454
+ "learning_rate": 0.00015823393850806998,
455
+ "loss": 0.6972,
456
+ "step": 580
457
+ },
458
+ {
459
+ "epoch": 1.6526610644257702,
460
+ "grad_norm": 1.5946452617645264,
461
+ "learning_rate": 0.00015674901338513672,
462
+ "loss": 0.6888,
463
+ "step": 590
464
+ },
465
+ {
466
+ "epoch": 1.680672268907563,
467
+ "grad_norm": 1.7266991138458252,
468
+ "learning_rate": 0.0001552453964243845,
469
+ "loss": 0.7029,
470
+ "step": 600
471
+ },
472
+ {
473
+ "epoch": 1.680672268907563,
474
+ "eval_loss": 1.0715969800949097,
475
+ "eval_runtime": 25.7893,
476
+ "eval_samples_per_second": 13.843,
477
+ "eval_steps_per_second": 1.745,
478
+ "step": 600
479
+ },
480
+ {
481
+ "epoch": 1.708683473389356,
482
+ "grad_norm": 1.7430881261825562,
483
+ "learning_rate": 0.000153723582883135,
484
+ "loss": 0.7032,
485
+ "step": 610
486
+ },
487
+ {
488
+ "epoch": 1.7366946778711485,
489
+ "grad_norm": 1.88009774684906,
490
+ "learning_rate": 0.00015218407401225057,
491
+ "loss": 0.6902,
492
+ "step": 620
493
+ },
494
+ {
495
+ "epoch": 1.7647058823529411,
496
+ "grad_norm": 1.7102521657943726,
497
+ "learning_rate": 0.000150627376891034,
498
+ "loss": 0.6866,
499
+ "step": 630
500
+ },
501
+ {
502
+ "epoch": 1.7927170868347337,
503
+ "grad_norm": 1.854864239692688,
504
+ "learning_rate": 0.0001490540042602077,
505
+ "loss": 0.6944,
506
+ "step": 640
507
+ },
508
+ {
509
+ "epoch": 1.8207282913165266,
510
+ "grad_norm": 1.61717689037323,
511
+ "learning_rate": 0.0001474644743530288,
512
+ "loss": 0.7043,
513
+ "step": 650
514
+ },
515
+ {
516
+ "epoch": 1.8487394957983194,
517
+ "grad_norm": 1.7749794721603394,
518
+ "learning_rate": 0.00014585931072459474,
519
+ "loss": 0.7008,
520
+ "step": 660
521
+ },
522
+ {
523
+ "epoch": 1.876750700280112,
524
+ "grad_norm": 1.7132004499435425,
525
+ "learning_rate": 0.00014423904207939615,
526
+ "loss": 0.7167,
527
+ "step": 670
528
+ },
529
+ {
530
+ "epoch": 1.9047619047619047,
531
+ "grad_norm": 1.784264326095581,
532
+ "learning_rate": 0.0001426042020971736,
533
+ "loss": 0.7203,
534
+ "step": 680
535
+ },
536
+ {
537
+ "epoch": 1.9327731092436975,
538
+ "grad_norm": 1.8548870086669922,
539
+ "learning_rate": 0.00014095532925713553,
540
+ "loss": 0.7198,
541
+ "step": 690
542
+ },
543
+ {
544
+ "epoch": 1.9607843137254903,
545
+ "grad_norm": 1.4964855909347534,
546
+ "learning_rate": 0.00013929296666059544,
547
+ "loss": 0.7053,
548
+ "step": 700
549
+ },
550
+ {
551
+ "epoch": 1.9607843137254903,
552
+ "eval_loss": 1.0620192289352417,
553
+ "eval_runtime": 25.7481,
554
+ "eval_samples_per_second": 13.865,
555
+ "eval_steps_per_second": 1.748,
556
+ "step": 700
557
+ },
558
+ {
559
+ "epoch": 1.988795518207283,
560
+ "grad_norm": 1.7073538303375244,
561
+ "learning_rate": 0.00013761766185208663,
562
+ "loss": 0.7338,
563
+ "step": 710
564
+ },
565
+ {
566
+ "epoch": 2.0168067226890756,
567
+ "grad_norm": 1.4009623527526855,
568
+ "learning_rate": 0.00013592996663901354,
569
+ "loss": 0.4904,
570
+ "step": 720
571
+ },
572
+ {
573
+ "epoch": 2.044817927170868,
574
+ "grad_norm": 1.6378936767578125,
575
+ "learning_rate": 0.00013423043690989874,
576
+ "loss": 0.3822,
577
+ "step": 730
578
+ },
579
+ {
580
+ "epoch": 2.0728291316526612,
581
+ "grad_norm": 1.8590551614761353,
582
+ "learning_rate": 0.00013251963245128603,
583
+ "loss": 0.3615,
584
+ "step": 740
585
+ },
586
+ {
587
+ "epoch": 2.100840336134454,
588
+ "grad_norm": 1.7983025312423706,
589
+ "learning_rate": 0.00013079811676335935,
590
+ "loss": 0.356,
591
+ "step": 750
592
+ },
593
+ {
594
+ "epoch": 2.1288515406162465,
595
+ "grad_norm": 2.0292270183563232,
596
+ "learning_rate": 0.00012906645687433857,
597
+ "loss": 0.3672,
598
+ "step": 760
599
+ },
600
+ {
601
+ "epoch": 2.156862745098039,
602
+ "grad_norm": 1.774778127670288,
603
+ "learning_rate": 0.00012732522315371338,
604
+ "loss": 0.3758,
605
+ "step": 770
606
+ },
607
+ {
608
+ "epoch": 2.184873949579832,
609
+ "grad_norm": 1.9023045301437378,
610
+ "learning_rate": 0.0001255749891243764,
611
+ "loss": 0.3674,
612
+ "step": 780
613
+ },
614
+ {
615
+ "epoch": 2.212885154061625,
616
+ "grad_norm": 1.4395744800567627,
617
+ "learning_rate": 0.00012381633127371783,
618
+ "loss": 0.3723,
619
+ "step": 790
620
+ },
621
+ {
622
+ "epoch": 2.2408963585434174,
623
+ "grad_norm": 1.6131172180175781,
624
+ "learning_rate": 0.00012204982886374348,
625
+ "loss": 0.3699,
626
+ "step": 800
627
+ },
628
+ {
629
+ "epoch": 2.2408963585434174,
630
+ "eval_loss": 1.2616853713989258,
631
+ "eval_runtime": 26.08,
632
+ "eval_samples_per_second": 13.689,
633
+ "eval_steps_per_second": 1.725,
634
+ "step": 800
635
+ },
636
+ {
637
+ "epoch": 2.26890756302521,
638
+ "grad_norm": 1.782162070274353,
639
+ "learning_rate": 0.00012027606374027915,
640
+ "loss": 0.3834,
641
+ "step": 810
642
+ },
643
+ {
644
+ "epoch": 2.2969187675070026,
645
+ "grad_norm": 1.7493244409561157,
646
+ "learning_rate": 0.00011849562014132367,
647
+ "loss": 0.3722,
648
+ "step": 820
649
+ },
650
+ {
651
+ "epoch": 2.3249299719887957,
652
+ "grad_norm": 1.5457279682159424,
653
+ "learning_rate": 0.00011670908450461421,
654
+ "loss": 0.3836,
655
+ "step": 830
656
+ },
657
+ {
658
+ "epoch": 2.3529411764705883,
659
+ "grad_norm": 1.914865255355835,
660
+ "learning_rate": 0.00011491704527446705,
661
+ "loss": 0.3929,
662
+ "step": 840
663
+ },
664
+ {
665
+ "epoch": 2.380952380952381,
666
+ "grad_norm": 1.9673585891723633,
667
+ "learning_rate": 0.00011312009270795729,
668
+ "loss": 0.3736,
669
+ "step": 850
670
+ },
671
+ {
672
+ "epoch": 2.4089635854341735,
673
+ "grad_norm": 1.508853554725647,
674
+ "learning_rate": 0.00011131881868050152,
675
+ "loss": 0.3818,
676
+ "step": 860
677
+ },
678
+ {
679
+ "epoch": 2.4369747899159666,
680
+ "grad_norm": 1.633651852607727,
681
+ "learning_rate": 0.00010951381649090766,
682
+ "loss": 0.3706,
683
+ "step": 870
684
+ },
685
+ {
686
+ "epoch": 2.4649859943977592,
687
+ "grad_norm": 1.7917840480804443,
688
+ "learning_rate": 0.00010770568066595556,
689
+ "loss": 0.372,
690
+ "step": 880
691
+ },
692
+ {
693
+ "epoch": 2.492997198879552,
694
+ "grad_norm": 1.7907308340072632,
695
+ "learning_rate": 0.00010589500676457358,
696
+ "loss": 0.392,
697
+ "step": 890
698
+ },
699
+ {
700
+ "epoch": 2.5210084033613445,
701
+ "grad_norm": 1.6397713422775269,
702
+ "learning_rate": 0.00010408239118167482,
703
+ "loss": 0.3866,
704
+ "step": 900
705
+ },
706
+ {
707
+ "epoch": 2.5210084033613445,
708
+ "eval_loss": 1.254116177558899,
709
+ "eval_runtime": 26.2156,
710
+ "eval_samples_per_second": 13.618,
711
+ "eval_steps_per_second": 1.717,
712
+ "step": 900
713
+ },
714
+ {
715
+ "epoch": 2.549019607843137,
716
+ "grad_norm": 1.613955020904541,
717
+ "learning_rate": 0.00010226843095171827,
718
+ "loss": 0.3973,
719
+ "step": 910
720
+ },
721
+ {
722
+ "epoch": 2.5770308123249297,
723
+ "grad_norm": 1.7094041109085083,
724
+ "learning_rate": 0.00010045372355205915,
725
+ "loss": 0.3697,
726
+ "step": 920
727
+ },
728
+ {
729
+ "epoch": 2.6050420168067228,
730
+ "grad_norm": 1.6542613506317139,
731
+ "learning_rate": 9.86388667061534e-05,
732
+ "loss": 0.3773,
733
+ "step": 930
734
+ },
735
+ {
736
+ "epoch": 2.6330532212885154,
737
+ "grad_norm": 1.57974374294281,
738
+ "learning_rate": 9.68244581866812e-05,
739
+ "loss": 0.381,
740
+ "step": 940
741
+ },
742
+ {
743
+ "epoch": 2.661064425770308,
744
+ "grad_norm": 1.8591638803482056,
745
+ "learning_rate": 9.501109561865415e-05,
746
+ "loss": 0.3722,
747
+ "step": 950
748
+ },
749
+ {
750
+ "epoch": 2.689075630252101,
751
+ "grad_norm": 1.8032310009002686,
752
+ "learning_rate": 9.319937628257121e-05,
753
+ "loss": 0.3812,
754
+ "step": 960
755
+ },
756
+ {
757
+ "epoch": 2.7170868347338937,
758
+ "grad_norm": 1.6205816268920898,
759
+ "learning_rate": 9.1389896917688e-05,
760
+ "loss": 0.358,
761
+ "step": 970
762
+ },
763
+ {
764
+ "epoch": 2.7450980392156863,
765
+ "grad_norm": 1.5483547449111938,
766
+ "learning_rate": 8.958325352546433e-05,
767
+ "loss": 0.3829,
768
+ "step": 980
769
+ },
770
+ {
771
+ "epoch": 2.773109243697479,
772
+ "grad_norm": 1.7492879629135132,
773
+ "learning_rate": 8.778004117325508e-05,
774
+ "loss": 0.3832,
775
+ "step": 990
776
+ },
777
+ {
778
+ "epoch": 2.8011204481792715,
779
+ "grad_norm": 1.752061128616333,
780
+ "learning_rate": 8.598085379830831e-05,
781
+ "loss": 0.3708,
782
+ "step": 1000
783
+ },
784
+ {
785
+ "epoch": 2.8011204481792715,
786
+ "eval_loss": 1.242302417755127,
787
+ "eval_runtime": 26.0048,
788
+ "eval_samples_per_second": 13.728,
789
+ "eval_steps_per_second": 1.73,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 2.8291316526610646,
794
+ "grad_norm": 1.7680628299713135,
795
+ "learning_rate": 8.418628401213588e-05,
796
+ "loss": 0.3916,
797
+ "step": 1010
798
+ },
799
+ {
800
+ "epoch": 2.857142857142857,
801
+ "grad_norm": 2.042778253555298,
802
+ "learning_rate": 8.239692290532066e-05,
803
+ "loss": 0.375,
804
+ "step": 1020
805
+ },
806
+ {
807
+ "epoch": 2.88515406162465,
808
+ "grad_norm": 1.5636996030807495,
809
+ "learning_rate": 8.061335985282463e-05,
810
+ "loss": 0.371,
811
+ "step": 1030
812
+ },
813
+ {
814
+ "epoch": 2.9131652661064424,
815
+ "grad_norm": 2.1579694747924805,
816
+ "learning_rate": 7.883618231986223e-05,
817
+ "loss": 0.3728,
818
+ "step": 1040
819
+ },
820
+ {
821
+ "epoch": 2.9411764705882355,
822
+ "grad_norm": 1.8203998804092407,
823
+ "learning_rate": 7.706597566840258e-05,
824
+ "loss": 0.3752,
825
+ "step": 1050
826
+ },
827
+ {
828
+ "epoch": 2.969187675070028,
829
+ "grad_norm": 1.7312395572662354,
830
+ "learning_rate": 7.53033229643646e-05,
831
+ "loss": 0.3789,
832
+ "step": 1060
833
+ },
834
+ {
835
+ "epoch": 2.9971988795518207,
836
+ "grad_norm": 1.616680383682251,
837
+ "learning_rate": 7.354880478556818e-05,
838
+ "loss": 0.3682,
839
+ "step": 1070
840
+ },
841
+ {
842
+ "epoch": 3.0252100840336134,
843
+ "grad_norm": 1.6187553405761719,
844
+ "learning_rate": 7.18029990305052e-05,
845
+ "loss": 0.2418,
846
+ "step": 1080
847
+ },
848
+ {
849
+ "epoch": 3.053221288515406,
850
+ "grad_norm": 1.2909393310546875,
851
+ "learning_rate": 7.006648072799272e-05,
852
+ "loss": 0.2199,
853
+ "step": 1090
854
+ },
855
+ {
856
+ "epoch": 3.081232492997199,
857
+ "grad_norm": 1.4550460577011108,
858
+ "learning_rate": 6.833982184777165e-05,
859
+ "loss": 0.2417,
860
+ "step": 1100
861
+ },
862
+ {
863
+ "epoch": 3.081232492997199,
864
+ "eval_loss": 1.3649441003799438,
865
+ "eval_runtime": 25.9266,
866
+ "eval_samples_per_second": 13.77,
867
+ "eval_steps_per_second": 1.736,
868
+ "step": 1100
869
+ },
870
+ {
871
+ "epoch": 3.1092436974789917,
872
+ "grad_norm": 1.4254740476608276,
873
+ "learning_rate": 6.662359111211268e-05,
874
+ "loss": 0.2217,
875
+ "step": 1110
876
+ },
877
+ {
878
+ "epoch": 3.1372549019607843,
879
+ "grad_norm": 1.446631908416748,
880
+ "learning_rate": 6.491835380849238e-05,
881
+ "loss": 0.2312,
882
+ "step": 1120
883
+ },
884
+ {
885
+ "epoch": 3.165266106442577,
886
+ "grad_norm": 1.636902093887329,
887
+ "learning_rate": 6.322467160340011e-05,
888
+ "loss": 0.2215,
889
+ "step": 1130
890
+ },
891
+ {
892
+ "epoch": 3.19327731092437,
893
+ "grad_norm": 1.3956204652786255,
894
+ "learning_rate": 6.154310235733785e-05,
895
+ "loss": 0.2286,
896
+ "step": 1140
897
+ },
898
+ {
899
+ "epoch": 3.2212885154061626,
900
+ "grad_norm": 1.1625617742538452,
901
+ "learning_rate": 5.9874199941073636e-05,
902
+ "loss": 0.2402,
903
+ "step": 1150
904
+ },
905
+ {
906
+ "epoch": 3.249299719887955,
907
+ "grad_norm": 1.3581691980361938,
908
+ "learning_rate": 5.821851405320903e-05,
909
+ "loss": 0.2313,
910
+ "step": 1160
911
+ },
912
+ {
913
+ "epoch": 3.277310924369748,
914
+ "grad_norm": 1.2999801635742188,
915
+ "learning_rate": 5.657659003912104e-05,
916
+ "loss": 0.2251,
917
+ "step": 1170
918
+ },
919
+ {
920
+ "epoch": 3.3053221288515404,
921
+ "grad_norm": 1.4706475734710693,
922
+ "learning_rate": 5.494896871133743e-05,
923
+ "loss": 0.2213,
924
+ "step": 1180
925
+ },
926
+ {
927
+ "epoch": 3.3333333333333335,
928
+ "grad_norm": 1.4545742273330688,
929
+ "learning_rate": 5.3336186171405764e-05,
930
+ "loss": 0.2333,
931
+ "step": 1190
932
+ },
933
+ {
934
+ "epoch": 3.361344537815126,
935
+ "grad_norm": 1.2404979467391968,
936
+ "learning_rate": 5.173877363331311e-05,
937
+ "loss": 0.2211,
938
+ "step": 1200
939
+ },
940
+ {
941
+ "epoch": 3.361344537815126,
942
+ "eval_loss": 1.4273622035980225,
943
+ "eval_runtime": 26.0864,
944
+ "eval_samples_per_second": 13.685,
945
+ "eval_steps_per_second": 1.725,
946
+ "step": 1200
947
+ },
948
+ {
949
+ "epoch": 3.3893557422969187,
950
+ "grad_norm": 1.3979793787002563,
951
+ "learning_rate": 5.0157257248516896e-05,
952
+ "loss": 0.2291,
953
+ "step": 1210
954
+ },
955
+ {
956
+ "epoch": 3.4173669467787113,
957
+ "grad_norm": 1.7199064493179321,
958
+ "learning_rate": 4.859215793264198e-05,
959
+ "loss": 0.229,
960
+ "step": 1220
961
+ },
962
+ {
963
+ "epoch": 3.4453781512605044,
964
+ "grad_norm": 1.4441235065460205,
965
+ "learning_rate": 4.7043991193903244e-05,
966
+ "loss": 0.2329,
967
+ "step": 1230
968
+ },
969
+ {
970
+ "epoch": 3.473389355742297,
971
+ "grad_norm": 1.363365650177002,
972
+ "learning_rate": 4.5513266963308654e-05,
973
+ "loss": 0.2308,
974
+ "step": 1240
975
+ },
976
+ {
977
+ "epoch": 3.5014005602240896,
978
+ "grad_norm": 1.415418267250061,
979
+ "learning_rate": 4.400048942669962e-05,
980
+ "loss": 0.2267,
981
+ "step": 1250
982
+ },
983
+ {
984
+ "epoch": 3.5294117647058822,
985
+ "grad_norm": 1.3714072704315186,
986
+ "learning_rate": 4.25061568586836e-05,
987
+ "loss": 0.2227,
988
+ "step": 1260
989
+ },
990
+ {
991
+ "epoch": 3.557422969187675,
992
+ "grad_norm": 1.2111363410949707,
993
+ "learning_rate": 4.10307614585139e-05,
994
+ "loss": 0.2322,
995
+ "step": 1270
996
+ },
997
+ {
998
+ "epoch": 3.585434173669468,
999
+ "grad_norm": 1.3968279361724854,
1000
+ "learning_rate": 3.957478918797013e-05,
1001
+ "loss": 0.225,
1002
+ "step": 1280
1003
+ },
1004
+ {
1005
+ "epoch": 3.6134453781512605,
1006
+ "grad_norm": 1.4936352968215942,
1007
+ "learning_rate": 3.8138719611293595e-05,
1008
+ "loss": 0.2191,
1009
+ "step": 1290
1010
+ },
1011
+ {
1012
+ "epoch": 3.641456582633053,
1013
+ "grad_norm": 1.3193854093551636,
1014
+ "learning_rate": 3.6723025737229744e-05,
1015
+ "loss": 0.2251,
1016
+ "step": 1300
1017
+ },
1018
+ {
1019
+ "epoch": 3.641456582633053,
1020
+ "eval_loss": 1.4654343128204346,
1021
+ "eval_runtime": 26.0893,
1022
+ "eval_samples_per_second": 13.684,
1023
+ "eval_steps_per_second": 1.725,
1024
+ "step": 1300
1025
+ },
1026
+ {
1027
+ "epoch": 3.669467787114846,
1028
+ "grad_norm": 1.2740771770477295,
1029
+ "learning_rate": 3.532817386322964e-05,
1030
+ "loss": 0.2235,
1031
+ "step": 1310
1032
+ },
1033
+ {
1034
+ "epoch": 3.697478991596639,
1035
+ "grad_norm": 1.1999931335449219,
1036
+ "learning_rate": 3.395462342186244e-05,
1037
+ "loss": 0.2244,
1038
+ "step": 1320
1039
+ },
1040
+ {
1041
+ "epoch": 3.7254901960784315,
1042
+ "grad_norm": 1.4531569480895996,
1043
+ "learning_rate": 3.2602826829488296e-05,
1044
+ "loss": 0.2218,
1045
+ "step": 1330
1046
+ },
1047
+ {
1048
+ "epoch": 3.753501400560224,
1049
+ "grad_norm": 1.5755056142807007,
1050
+ "learning_rate": 3.127322933724317e-05,
1051
+ "loss": 0.2331,
1052
+ "step": 1340
1053
+ },
1054
+ {
1055
+ "epoch": 3.7815126050420167,
1056
+ "grad_norm": 1.2473876476287842,
1057
+ "learning_rate": 2.9966268884382808e-05,
1058
+ "loss": 0.2171,
1059
+ "step": 1350
1060
+ },
1061
+ {
1062
+ "epoch": 3.8095238095238093,
1063
+ "grad_norm": 1.3828644752502441,
1064
+ "learning_rate": 2.8682375954035756e-05,
1065
+ "loss": 0.2269,
1066
+ "step": 1360
1067
+ },
1068
+ {
1069
+ "epoch": 3.8375350140056024,
1070
+ "grad_norm": 1.473433494567871,
1071
+ "learning_rate": 2.7421973431411775e-05,
1072
+ "loss": 0.2121,
1073
+ "step": 1370
1074
+ },
1075
+ {
1076
+ "epoch": 3.865546218487395,
1077
+ "grad_norm": 1.1347614526748657,
1078
+ "learning_rate": 2.618547646451317e-05,
1079
+ "loss": 0.2187,
1080
+ "step": 1380
1081
+ },
1082
+ {
1083
+ "epoch": 3.8935574229691876,
1084
+ "grad_norm": 1.4064631462097168,
1085
+ "learning_rate": 2.497329232739438e-05,
1086
+ "loss": 0.2243,
1087
+ "step": 1390
1088
+ },
1089
+ {
1090
+ "epoch": 3.9215686274509802,
1091
+ "grad_norm": 1.394667148590088,
1092
+ "learning_rate": 2.378582028601529e-05,
1093
+ "loss": 0.222,
1094
+ "step": 1400
1095
+ },
1096
+ {
1097
+ "epoch": 3.9215686274509802,
1098
+ "eval_loss": 1.474218487739563,
1099
+ "eval_runtime": 26.0178,
1100
+ "eval_samples_per_second": 13.721,
1101
+ "eval_steps_per_second": 1.73,
1102
+ "step": 1400
1103
+ },
1104
+ {
1105
+ "epoch": 3.9495798319327733,
1106
+ "grad_norm": 1.4565016031265259,
1107
+ "learning_rate": 2.262345146673186e-05,
1108
+ "loss": 0.2228,
1109
+ "step": 1410
1110
+ },
1111
+ {
1112
+ "epoch": 3.977591036414566,
1113
+ "grad_norm": 0.9852085113525391,
1114
+ "learning_rate": 2.1486568727468115e-05,
1115
+ "loss": 0.2195,
1116
+ "step": 1420
1117
+ },
1118
+ {
1119
+ "epoch": 4.0056022408963585,
1120
+ "grad_norm": 0.557214617729187,
1121
+ "learning_rate": 2.0375546531611455e-05,
1122
+ "loss": 0.2129,
1123
+ "step": 1430
1124
+ },
1125
+ {
1126
+ "epoch": 4.033613445378151,
1127
+ "grad_norm": 0.5626720786094666,
1128
+ "learning_rate": 1.9290750824672766e-05,
1129
+ "loss": 0.169,
1130
+ "step": 1440
1131
+ },
1132
+ {
1133
+ "epoch": 4.061624649859944,
1134
+ "grad_norm": 0.6319034695625305,
1135
+ "learning_rate": 1.823253891375247e-05,
1136
+ "loss": 0.1704,
1137
+ "step": 1450
1138
+ },
1139
+ {
1140
+ "epoch": 4.089635854341736,
1141
+ "grad_norm": 1.131064534187317,
1142
+ "learning_rate": 1.7201259349851416e-05,
1143
+ "loss": 0.167,
1144
+ "step": 1460
1145
+ },
1146
+ {
1147
+ "epoch": 4.117647058823529,
1148
+ "grad_norm": 0.8115044832229614,
1149
+ "learning_rate": 1.619725181306637e-05,
1150
+ "loss": 0.1697,
1151
+ "step": 1470
1152
+ },
1153
+ {
1154
+ "epoch": 4.1456582633053225,
1155
+ "grad_norm": 0.8886174559593201,
1156
+ "learning_rate": 1.5220847000706839e-05,
1157
+ "loss": 0.1724,
1158
+ "step": 1480
1159
+ },
1160
+ {
1161
+ "epoch": 4.173669467787115,
1162
+ "grad_norm": 0.7718709707260132,
1163
+ "learning_rate": 1.4272366518371093e-05,
1164
+ "loss": 0.1701,
1165
+ "step": 1490
1166
+ },
1167
+ {
1168
+ "epoch": 4.201680672268908,
1169
+ "grad_norm": 0.8207479119300842,
1170
+ "learning_rate": 1.3352122774016407e-05,
1171
+ "loss": 0.1658,
1172
+ "step": 1500
1173
+ },
1174
+ {
1175
+ "epoch": 4.201680672268908,
1176
+ "eval_loss": 1.7015750408172607,
1177
+ "eval_runtime": 26.1361,
1178
+ "eval_samples_per_second": 13.659,
1179
+ "eval_steps_per_second": 1.722,
1180
+ "step": 1500
1181
+ }
1182
+ ],
1183
+ "logging_steps": 10,
1184
+ "max_steps": 1785,
1185
+ "num_input_tokens_seen": 0,
1186
+ "num_train_epochs": 5,
1187
+ "save_steps": 100,
1188
+ "stateful_callbacks": {
1189
+ "EarlyStoppingCallback": {
1190
+ "args": {
1191
+ "early_stopping_patience": 12,
1192
+ "early_stopping_threshold": 0.0
1193
+ },
1194
+ "attributes": {
1195
+ "early_stopping_patience_counter": 12
1196
+ }
1197
+ },
1198
+ "TrainerControl": {
1199
+ "args": {
1200
+ "should_epoch_stop": false,
1201
+ "should_evaluate": false,
1202
+ "should_log": false,
1203
+ "should_save": true,
1204
+ "should_training_stop": true
1205
+ },
1206
+ "attributes": {}
1207
+ }
1208
+ },
1209
+ "total_flos": 1.0091048497210982e+17,
1210
+ "train_batch_size": 8,
1211
+ "trial_name": null,
1212
+ "trial_params": null
1213
+ }
checkpoint-1500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
3
+ size 5368
checkpoint-300/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
checkpoint-300/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "o_proj",
27
+ "up_proj",
28
+ "gate_proj",
29
+ "q_proj",
30
+ "k_proj",
31
+ "v_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-300/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67780d5284cd1c059d99826e0cec6b5f8e1aa17cc818632dbf1f772a9ed7c594
3
+ size 167832240
checkpoint-300/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0db96968c04f4ec5a8e7abe67b907d7cadc0f99006087d87d15448de890f648e
3
+ size 335922386
checkpoint-300/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47e2457cae0016bbcda384d81f2b91e8f4e899f8dadfcdfdf008f8936e92324e
3
+ size 14244
checkpoint-300/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6206ce0a70f6ce599823529d33dc15bcfed77431fdfb979429d85dfc33e2eede
3
+ size 988
checkpoint-300/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8580816c0ed7ca6e8664cb2fbfe83f21883509719cd34d37cc3c06e8020236a
3
+ size 1064
checkpoint-300/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-300/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-300/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
checkpoint-300/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-300/trainer_state.json ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 300,
3
+ "best_metric": 1.0371309518814087,
4
+ "best_model_checkpoint": "./results/survey-finetuned-Mistral-7B-Instruct-v0.3/checkpoint-300",
5
+ "epoch": 0.8403361344537815,
6
+ "eval_steps": 100,
7
+ "global_step": 300,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.028011204481792718,
14
+ "grad_norm": 5.871906757354736,
15
+ "learning_rate": 2.962962962962963e-05,
16
+ "loss": 2.3552,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.056022408963585436,
21
+ "grad_norm": 2.389425039291382,
22
+ "learning_rate": 6.296296296296296e-05,
23
+ "loss": 1.7094,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.08403361344537816,
28
+ "grad_norm": 2.160494804382324,
29
+ "learning_rate": 0.0001,
30
+ "loss": 1.4552,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.11204481792717087,
35
+ "grad_norm": 2.3963708877563477,
36
+ "learning_rate": 0.00013703703703703705,
37
+ "loss": 1.2928,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.1400560224089636,
42
+ "grad_norm": 2.192927598953247,
43
+ "learning_rate": 0.00017407407407407408,
44
+ "loss": 1.1542,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.16806722689075632,
49
+ "grad_norm": 1.6872493028640747,
50
+ "learning_rate": 0.00019999851776504612,
51
+ "loss": 1.1238,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.19607843137254902,
56
+ "grad_norm": 1.8370214700698853,
57
+ "learning_rate": 0.00019997216814377874,
58
+ "loss": 1.1234,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.22408963585434175,
63
+ "grad_norm": 1.7727515697479248,
64
+ "learning_rate": 0.0001999128899579094,
65
+ "loss": 1.1143,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.25210084033613445,
70
+ "grad_norm": 1.5649667978286743,
71
+ "learning_rate": 0.00019982070273232797,
72
+ "loss": 1.112,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.2801120448179272,
77
+ "grad_norm": 1.5107511281967163,
78
+ "learning_rate": 0.0001996956368314155,
79
+ "loss": 1.1154,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.2801120448179272,
84
+ "eval_loss": 1.0852699279785156,
85
+ "eval_runtime": 25.3067,
86
+ "eval_samples_per_second": 14.107,
87
+ "eval_steps_per_second": 1.778,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.3081232492997199,
92
+ "grad_norm": 1.5419102907180786,
93
+ "learning_rate": 0.00019953773344904302,
94
+ "loss": 1.0686,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.33613445378151263,
99
+ "grad_norm": 1.5007447004318237,
100
+ "learning_rate": 0.00019934704459500312,
101
+ "loss": 1.0849,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.3641456582633053,
106
+ "grad_norm": 1.67251455783844,
107
+ "learning_rate": 0.000199123633077879,
108
+ "loss": 1.0793,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.39215686274509803,
113
+ "grad_norm": 1.4843816757202148,
114
+ "learning_rate": 0.00019886757248435704,
115
+ "loss": 1.0122,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.42016806722689076,
120
+ "grad_norm": 1.4718296527862549,
121
+ "learning_rate": 0.00019857894715498873,
122
+ "loss": 1.0423,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.4481792717086835,
127
+ "grad_norm": 1.4704537391662598,
128
+ "learning_rate": 0.00019825785215641086,
129
+ "loss": 1.0524,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.47619047619047616,
134
+ "grad_norm": 1.4213836193084717,
135
+ "learning_rate": 0.0001979043932500329,
136
+ "loss": 1.0373,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.5042016806722689,
141
+ "grad_norm": 1.579439401626587,
142
+ "learning_rate": 0.00019751868685720134,
143
+ "loss": 1.0539,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.5322128851540616,
148
+ "grad_norm": 1.4878956079483032,
149
+ "learning_rate": 0.00019710086002085335,
150
+ "loss": 1.0315,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.5602240896358543,
155
+ "grad_norm": 1.5998858213424683,
156
+ "learning_rate": 0.0001966510503636716,
157
+ "loss": 1.038,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.5602240896358543,
162
+ "eval_loss": 1.0578807592391968,
163
+ "eval_runtime": 25.2375,
164
+ "eval_samples_per_second": 14.146,
165
+ "eval_steps_per_second": 1.783,
166
+ "step": 200
167
+ },
168
+ {
169
+ "epoch": 0.5882352941176471,
170
+ "grad_norm": 1.5214334726333618,
171
+ "learning_rate": 0.00019616940604275454,
172
+ "loss": 1.0515,
173
+ "step": 210
174
+ },
175
+ {
176
+ "epoch": 0.6162464985994398,
177
+ "grad_norm": 1.7388910055160522,
178
+ "learning_rate": 0.00019565608570081672,
179
+ "loss": 1.07,
180
+ "step": 220
181
+ },
182
+ {
183
+ "epoch": 0.6442577030812325,
184
+ "grad_norm": 1.5043017864227295,
185
+ "learning_rate": 0.00019511125841393558,
186
+ "loss": 1.0545,
187
+ "step": 230
188
+ },
189
+ {
190
+ "epoch": 0.6722689075630253,
191
+ "grad_norm": 1.5737371444702148,
192
+ "learning_rate": 0.00019453510363586155,
193
+ "loss": 1.0546,
194
+ "step": 240
195
+ },
196
+ {
197
+ "epoch": 0.7002801120448179,
198
+ "grad_norm": 1.5473017692565918,
199
+ "learning_rate": 0.00019392781113891007,
200
+ "loss": 1.0306,
201
+ "step": 250
202
+ },
203
+ {
204
+ "epoch": 0.7282913165266106,
205
+ "grad_norm": 1.6050424575805664,
206
+ "learning_rate": 0.00019328958095145496,
207
+ "loss": 1.0285,
208
+ "step": 260
209
+ },
210
+ {
211
+ "epoch": 0.7563025210084033,
212
+ "grad_norm": 1.4448118209838867,
213
+ "learning_rate": 0.00019262062329204356,
214
+ "loss": 1.0444,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 0.7843137254901961,
219
+ "grad_norm": 1.7479037046432495,
220
+ "learning_rate": 0.0001919211585001554,
221
+ "loss": 1.0341,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 0.8123249299719888,
226
+ "grad_norm": 1.4101306200027466,
227
+ "learning_rate": 0.00019119141696362759,
228
+ "loss": 1.0244,
229
+ "step": 290
230
+ },
231
+ {
232
+ "epoch": 0.8403361344537815,
233
+ "grad_norm": 1.4832830429077148,
234
+ "learning_rate": 0.00019043163904277016,
235
+ "loss": 1.0407,
236
+ "step": 300
237
+ },
238
+ {
239
+ "epoch": 0.8403361344537815,
240
+ "eval_loss": 1.0371309518814087,
241
+ "eval_runtime": 25.2977,
242
+ "eval_samples_per_second": 14.112,
243
+ "eval_steps_per_second": 1.779,
244
+ "step": 300
245
+ }
246
+ ],
247
+ "logging_steps": 10,
248
+ "max_steps": 1785,
249
+ "num_input_tokens_seen": 0,
250
+ "num_train_epochs": 5,
251
+ "save_steps": 100,
252
+ "stateful_callbacks": {
253
+ "EarlyStoppingCallback": {
254
+ "args": {
255
+ "early_stopping_patience": 12,
256
+ "early_stopping_threshold": 0.0
257
+ },
258
+ "attributes": {
259
+ "early_stopping_patience_counter": 0
260
+ }
261
+ },
262
+ "TrainerControl": {
263
+ "args": {
264
+ "should_epoch_stop": false,
265
+ "should_evaluate": false,
266
+ "should_log": false,
267
+ "should_save": true,
268
+ "should_training_stop": false
269
+ },
270
+ "attributes": {}
271
+ }
272
+ },
273
+ "total_flos": 2.004525617774592e+16,
274
+ "train_batch_size": 8,
275
+ "trial_name": null,
276
+ "trial_params": null
277
+ }
checkpoint-300/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e4f1545faf17e9e0c7738e6282a9a7eae62ded595af804131816e84a7e34c23
3
+ size 5368
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "quantization_config": {
19
+ "_load_in_4bit": false,
20
+ "_load_in_8bit": true,
21
+ "bnb_4bit_compute_dtype": "float32",
22
+ "bnb_4bit_quant_storage": "uint8",
23
+ "bnb_4bit_quant_type": "fp4",
24
+ "bnb_4bit_use_double_quant": false,
25
+ "llm_int8_enable_fp32_cpu_offload": false,
26
+ "llm_int8_has_fp16_weight": false,
27
+ "llm_int8_skip_modules": null,
28
+ "llm_int8_threshold": 6.0,
29
+ "load_in_4bit": false,
30
+ "load_in_8bit": true,
31
+ "quant_method": "bitsandbytes"
32
+ },
33
+ "rms_norm_eps": 1e-05,
34
+ "rope_theta": 1000000.0,
35
+ "sliding_window": null,
36
+ "tie_word_embeddings": false,
37
+ "torch_dtype": "float32",
38
+ "transformers_version": "4.51.1",
39
+ "use_cache": true,
40
+ "vocab_size": 32768
41
+ }
evaluation_results.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "overall": {
3
+ "bleu": 0.07838348900432857,
4
+ "rouge1": 0.4160263728442303,
5
+ "rouge2": 0.12298112561122547,
6
+ "rougeL": 0.22705529378080125,
7
+ "bertscore_f1": 0.888710463643074,
8
+ "survey_quality": 0.8665336862809816,
9
+ "length_sim": 0.8866010588429448,
10
+ "sentence_count_sim": 0.9129999999999999,
11
+ "sentiment_sim": 0.8
12
+ },
13
+ "domain_averages": {
14
+ "healthcare": {
15
+ "bleu": 0.07762374191145698,
16
+ "rouge1": 0.42243627141751405,
17
+ "rouge2": 0.12138260144908483,
18
+ "rougeL": 0.23979926785693628,
19
+ "bertscore_f1": 0.8950019955635071,
20
+ "survey_quality": 0.8852262899250766,
21
+ "length_sim": 0.8856788697752295,
22
+ "sentence_count_sim": 0.9200000000000002,
23
+ "sentiment_sim": 0.85
24
+ },
25
+ "education": {
26
+ "bleu": 0.07919017058872602,
27
+ "rouge1": 0.4403319723577309,
28
+ "rouge2": 0.1241865105142728,
29
+ "rougeL": 0.22697034499313967,
30
+ "bertscore_f1": 0.8952235221862793,
31
+ "survey_quality": 0.9229134973562958,
32
+ "length_sim": 0.9287404920688876,
33
+ "sentence_count_sim": 0.9400000000000001,
34
+ "sentiment_sim": 0.9
35
+ },
36
+ "consumer_preferences": {
37
+ "bleu": 0.04953106502738471,
38
+ "rouge1": 0.37131249695672947,
39
+ "rouge2": 0.09227813132714946,
40
+ "rougeL": 0.1949837207349378,
41
+ "bertscore_f1": 0.879725044965744,
42
+ "survey_quality": 0.8690244742478145,
43
+ "length_sim": 0.8970734227434429,
44
+ "sentence_count_sim": 0.9600000000000002,
45
+ "sentiment_sim": 0.75
46
+ },
47
+ "technology": {
48
+ "bleu": 0.05616084930469599,
49
+ "rouge1": 0.3952421299607451,
50
+ "rouge2": 0.1009719226005579,
51
+ "rougeL": 0.21178035991009433,
52
+ "bertscore_f1": 0.8853113055229187,
53
+ "survey_quality": 0.8866657940325023,
54
+ "length_sim": 0.914997382097507,
55
+ "sentence_count_sim": 0.8950000000000001,
56
+ "sentiment_sim": 0.85
57
+ },
58
+ "social_issues": {
59
+ "bleu": 0.10838373156460235,
60
+ "rouge1": 0.44608640880018047,
61
+ "rouge2": 0.14331504500490966,
62
+ "rougeL": 0.2531005998587629,
63
+ "bertscore_f1": 0.8945880949497222,
64
+ "survey_quality": 0.8324464426080483,
65
+ "length_sim": 0.8473393278241452,
66
+ "sentence_count_sim": 0.9000000000000001,
67
+ "sentiment_sim": 0.75
68
+ },
69
+ "environment": {
70
+ "bleu": 0.09321296997822225,
71
+ "rouge1": 0.44902727106616364,
72
+ "rouge2": 0.14650465867302334,
73
+ "rougeL": 0.25364371179912665,
74
+ "bertscore_f1": 0.8934633910655976,
75
+ "survey_quality": 0.9090866835595361,
76
+ "length_sim": 0.9005933840119422,
77
+ "sentence_count_sim": 0.9266666666666667,
78
+ "sentiment_sim": 0.9
79
+ },
80
+ "finance": {
81
+ "bleu": 0.13880939950659477,
82
+ "rouge1": 0.43427252204059863,
83
+ "rouge2": 0.19057459235218183,
84
+ "rougeL": 0.2678512413926847,
85
+ "bertscore_f1": 0.8983778953552246,
86
+ "survey_quality": 0.8420706606186332,
87
+ "length_sim": 0.8912119818558993,
88
+ "sentence_count_sim": 0.835,
89
+ "sentiment_sim": 0.8
90
+ },
91
+ "work_experience": {
92
+ "bleu": 0.06487372536521903,
93
+ "rouge1": 0.4167909303194882,
94
+ "rouge2": 0.1182192780555242,
95
+ "rougeL": 0.21887790025908646,
96
+ "bertscore_f1": 0.8860084712505341,
97
+ "survey_quality": 0.8390712140097654,
98
+ "length_sim": 0.8672136420292962,
99
+ "sentence_count_sim": 0.9,
100
+ "sentiment_sim": 0.75
101
+ },
102
+ "lifestyle": {
103
+ "bleu": 0.04366261544127066,
104
+ "rouge1": 0.37809352760789394,
105
+ "rouge2": 0.08183308983704551,
106
+ "rougeL": 0.18753344087026794,
107
+ "bertscore_f1": 0.8770935416221619,
108
+ "survey_quality": 0.8502691198859227,
109
+ "length_sim": 0.8741406929911015,
110
+ "sentence_count_sim": 0.9266666666666667,
111
+ "sentiment_sim": 0.75
112
+ },
113
+ "demographics": {
114
+ "bleu": 0.07238662135511283,
115
+ "rouge1": 0.40667019791525794,
116
+ "rouge2": 0.11054542629850524,
117
+ "rougeL": 0.21601235013297587,
118
+ "bertscore_f1": 0.8823113739490509,
119
+ "survey_quality": 0.8285626865662209,
120
+ "length_sim": 0.8590213930319963,
121
+ "sentence_count_sim": 0.9266666666666665,
122
+ "sentiment_sim": 0.7
123
+ }
124
+ }
125
+ }
evaluation_viz/domain_comparison.png ADDED
evaluation_viz/metrics_correlation.png ADDED
evaluation_viz/overall_metrics.png ADDED
evaluation_viz/sample_comparisons.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Domain,Question,Reference,Generated,BLEU,ROUGE-L,Survey Quality
2
+ healthcare,How often did you get help as soon as you wanted after you pressed the call button?,"As an experienced nurse, I understand how crucial it is for patients to receive timely assistance after pressing the call button, especially those with limited English proficiency who may feel even more vulnerable. In my experience, our team strives to respond promptly to call button alerts, as we recognize that delays can lead to anxiety and discomfort. However, there are times when our response may not be immediate due to high patient volumes or unexpected emergencies. We are always working to improve our processes to ensure that every patient feels heard and receives the help they need as soon as possible. Clear communication and understanding are key, and we make every effort to ensure that language barriers do not impede the quality of care we provide.","nderstand the importance of timely responses when it comes to patient care. In my experience, patients with limited English proficiency often face delays in receiving help, especially when there are language barriers. It's crucial for healthcare providers to ensure that they have the resources and support needed to communicate effectively with patients, so they can receive the help they need promptly. This includes investing in language services and training for staff to enhance their ability to communicate effectively with patients, regardless of their language background.",0.08389948425814907,0.18867924528301885,0.8281879194630872
3
+ healthcare,How often was the area around your room quiet at night?,"As a medical doctor, I understand the importance of a good night's sleep for overall health, so I value a quiet environment at night. Typically, I would say that the area around my room is quiet most of the time, which is crucial for rest and recovery. However, there are occasional disturbances, such as outside traffic or neighborhood activity. I'm interested in exploring integrative approaches, such as sound therapy or relaxation techniques, to manage the impact of any noise. It's important for me to maintain a peaceful environment to ensure I can be alert and focused for my patients during the day.","l medical doctor, I find it crucial to maintain a peaceful environment for both my patients and myself. A quiet night is essential for restorative sleep, which is vital for overall health. In my experience, the area around my room is usually quite peaceful, allowing me to focus on my practice and rest without disturbance. However, I do notice the occasional noise from late-night activities or city life, which can be a reminder of the need for urban planning that prioritizes peace and wellness. Overall, a peaceful environment is a cornerstone of holistic health, and I make it a priority in my own life.",0.1578953325822474,0.24761904761904763,0.8255208333333334
4
+ healthcare,"Before giving you any new medicine, how often did hospital staff describe possible side effects in a way you could understand?","As a traditional medical doctor, I recognize the importance of clear communication about potential side effects when prescribing new medications. In my experience, hospital staff generally do a good job of explaining these aspects, but there is always room for improvement. Typically, they provide the necessary information, but the challenge often lies in tailoring the explanation to the patient's level of understanding. I believe it’s crucial to ensure that patients fully grasp the potential risks and benefits, as this empowers them to make informed decisions about their health. Moving forward, I’m interested in exploring how integrative approaches might enhance this communication process, perhaps by incorporating holistic perspectives that consider the patient’s overall well-being.","l medical doctor, I have always valued clear communication, especially when it comes to potential side effects of new medications. In my experience, hospital staff have consistently done a commendable job in explaining the side effects of new treatments in a way that is both understandable and comprehensive. They often use simple language and provide written materials, ensuring that patients can review the information at their own pace. However, I believe that integrating alternative treatments like sound therapy could benefit from similar communication standards. It's crucial that patients fully understand the potential benefits and risks of any treatment, whether it's a conventional medicine or an alternative therapy like sound therapy.",0.13520814569504014,0.30434782608695654,0.7927927927927928
5
+ healthcare,How often was the area around your room quiet at night?,"As a hospital administrator, ensuring a quiet and restful environment for our patients at night is one of our top priorities. We have invested in various measures, such as soundproofing, staff training, and noise reduction protocols, to minimize disturbances. While there are occasional challenges, such as emergencies or necessary medical interventions, we strive to maintain a peaceful atmosphere. Feedback from our patients generally indicates that the area around their rooms remains quiet most nights, allowing them to rest and recover effectively.","dministrator, I've always prioritized a calm environment for our patients, especially at night when they need restful sleep for recovery. However, with the hustle and bustle of a busy hospital, it can be challenging to ensure that the area around patient rooms is consistently quiet. We've implemented measures like soundproofing and staff training, but there are always areas for improvement. If the software solutions can help reduce noise levels or even monitor them, that would be a significant step forward in enhancing patient care.",0.08703668011357521,0.20359281437125748,0.9878787878787879
6
+ healthcare,"During this hospital stay, how often did doctors listen carefully to you?","As a specialist doctor, I understand the critical importance of listening carefully to patients, as it forms the foundation for accurate diagnosis and effective treatment plans. In my experience during hospital stays—whether as a patient or observing colleagues—I have always noted that the majority of doctors make a concerted effort to listen to their patients, although time constraints can sometimes be a challenge. Personally, I prioritize active listening, ensuring that my patients feel heard and understood, because it not only builds trust but also often reveals crucial information that might not come up otherwise. However, I recognize that the level of attentiveness can vary among practitioners and situations.","doctor, I've always placed a high priority on clear and effective communication, both with patients and colleagues. During this hospital stay, I found that doctors consistently took the time to listen carefully to my input. They demonstrated a genuine interest in understanding the latest pharmaceutical products and how they can enhance patient care. This collaboration is crucial for ensuring that we can make informed decisions that benefit our patients, and I appreciate the focus on shared learning and expertise. Overall, the discussions were productive and informative, highlighting the importance of continued collaboration in healthcare.",0.0,0.1951219512195122,0.8828282828282829
7
+ healthcare,How often was your pain well controlled during this hospital stay?,"As a healthcare provider, I can't personally answer this question since it pertains to a patient's experience during a hospital stay. However, I can emphasize the importance of this question in understanding patient satisfaction and the effectiveness of pain management protocols. From my perspective, consistently managing pain well is crucial for patient recovery and overall experience. In developing a telemedicine app, ensuring that patients can easily communicate their pain levels and receive timely interventions is a priority, as it can enhance their comfort and trust in their care provider.","provider focused on developing a secure telemedicine app, I deeply understand the importance of effective pain management. While I haven't experienced hospital stays myself, I often hear from patients that they struggle with controlling their pain. That's why we're developing an app that integrates secure communication between patients and healthcare providers, ensuring that pain management plans are more effective and timely. By streamlining communication, we aim to empower patients to manage their pain more efficiently, reducing the stress and discomfort associated with traditional methods.",0.0,0.21348314606741575,0.990990990990991
8
+ healthcare,How often was your pain well controlled during this hospital stay?,"As a healthcare administrator, I don't personally experience the care, but I can share insights based on patient feedback from our recent equipment upgrade. Since upgrading our inventory, we've received positive feedback about improved pain management, largely due to advanced monitoring and updated analgesic delivery systems. Patients have reported more precise pain control and quicker response times from staff, as the new equipment allows for better tracking and notification. While there's always room for improvement, these upgrades have significantly contributed to patient comfort during their stay.","administrator, I don't personally experience pain during a hospital stay, but I can share insights based on feedback from patients and staff. In our recent upgrade, we prioritized acquiring advanced equipment that could accurately monitor and manage patient pain, ensuring timely and effective interventions. From what I've heard, patients are generally more satisfied with their pain management, thanks to more accurate assessments and tailored treatment plans. Of course, there's always room for improvement, and we're continually evaluating how to further enhance our pain management strategies.",0.21364596957236429,0.3595505617977528,0.9884057971014494
9
+ healthcare,"During this hospital stay, how often did nurses explain things in a way you could understand?","As a hospital administrator, I've observed that effective communication is key to improving patient care and satisfaction. Our nurses are highly trained and generally excel at explaining medical procedures and care plans in understandable terms. However, I also recognize there are occasional challenges due to the complexity of medical information. To address this, we're exploring innovative software solutions that can provide nurses with real-time access to simplified medical explanations and visual aids, ensuring that every patient fully comprehends their care. This initiative is part of our commitment to streamline processes and enhance the overall patient experience.","dministrator, I place a high priority on clear and effective communication between healthcare providers and patients, especially when it comes to explaining complex medical information. During this hospital stay, I found that nurses consistently took the time to break down medical procedures and treatment plans in a way that was easy to understand, using simple language and even providing visual aids when necessary. This approach not only helped in reducing anxiety but also empowered patients to make informed decisions about their care. Overall, the nurses' ability to communicate effectively was commendable and contributed to a more positive experience for everyone involved.",0.0,0.21,0.925068870523416
10
+ healthcare,How often was the area around your room quiet at night?,"In our facility, maintaining a quiet and restful environment at night is a top priority since it directly impacts patient recovery and satisfaction. We have implemented several noise-reduction strategies, such as soft-closing doors and designated quiet hours, to ensure a peaceful atmosphere. However, I acknowledge that there are occasional disturbances due to necessary medical procedures or emergencies, which we try to minimize as much as possible. We continually assess and improve our practices based on patient feedback to strike the right balance between essential care and maintaining tranquility.","a large healthcare facility, I understand the importance of a peaceful environment for patients, especially at night when restful sleep is crucial for recovery. I make it a priority to ensure our facility is as quiet as possible, especially around patient rooms. Our staff is trained to minimize noise, and we've implemented soundproofing measures where necessary. While there might be occasional disturbances, our team is proactive in addressing them, ensuring that patients can rest comfortably. Overall, I'd say our facility is quite tranquil at night, which is a testament to our commitment to patient care and well-being.",0.0,0.22105263157894736,0.8910052910052911
11
+ healthcare,"Before giving you any new medicine, how often did hospital staff describe possible side effects in a way you could understand?","As a healthcare researcher with a keen interest in neural networks for medical diagnosis, I find it crucial that communication about medication and its side effects is clear and comprehensible. In my experience, hospital staff generally do a good job of explaining potential side effects, though there can be variability depending on the staff's experience and workload. It's important for medical professionals to ensure that patients fully understand the information, as this can significantly impact patient outcomes and adherence to treatment. Ideally, I would like to see more consistent use of tools, possibly even AI-driven, to standardize and improve these communications. Such tools could provide tailored explanations based on a patient’s health literacy level, which I believe would be an excellent application of neural networks in healthcare.","researcher focused on neural networks, I deeply value clear communication and transparency in healthcare. In my experience, hospital staff consistently strive to explain potential side effects of new medications in a clear and understandable manner. They often use layman's terms or provide visual aids, which greatly enhances comprehension. However, there's always room for improvement, especially in ensuring that patients fully grasp the information, which is crucial for informed decision-making. Integrating neural networks could potentially enhance this process by providing more personalized explanations based on patient data.",0.09855180689319373,0.2545454545454545,0.7395833333333334
evaluation_viz/survey_metrics.png ADDED
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.51.1"
6
+ }