SudiptoPramanik commited on
Commit
8b64380
·
verified ·
1 Parent(s): 47bde41

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,55 +1,202 @@
1
  ---
 
2
  library_name: peft
3
- license: llama3.2
4
- base_model: meta-llama/Llama-3.2-1B
5
- tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: llama-finetuned
9
- results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # llama-finetuned
16
 
17
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the None dataset.
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
 
 
 
 
 
 
32
 
33
- ### Training hyperparameters
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 5e-05
37
- - train_batch_size: 2
38
- - eval_batch_size: 2
39
- - seed: 42
40
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
41
- - lr_scheduler_type: linear
42
- - num_epochs: 5
43
- - mixed_precision_training: Native AMP
44
 
45
- ### Training results
 
 
46
 
 
47
 
 
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ### Framework versions
50
 
51
- - PEFT 0.15.2
52
- - Transformers 4.52.4
53
- - Pytorch 2.6.0+cu124
54
- - Datasets 2.14.4
55
- - Tokenizers 0.21.1
 
1
  ---
2
+ base_model: meta-llama/Llama-3.2-3B-Instruct
3
  library_name: peft
 
 
 
 
 
 
 
4
  ---
5
 
6
+ # Model Card for Model ID
 
7
 
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
+ ## Model Details
13
 
14
+ ### Model Description
15
 
16
+ <!-- Provide a longer summary of what this model is. -->
17
 
 
18
 
 
19
 
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
+ ### Model Sources [optional]
29
 
30
+ <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
 
31
 
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
 
36
+ ## Uses
37
 
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
  ### Framework versions
201
 
202
+ - PEFT 0.15.2
 
 
 
 
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "v_proj",
28
- "q_proj",
29
  "k_proj",
30
- "o_proj"
 
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
+ "o_proj",
 
28
  "k_proj",
29
+ "v_proj",
30
+ "q_proj"
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85a94d974409456e3c95935ba3868ea2a1ce6587e7ca88a8214846c9ee0130dd
3
- size 6832520
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc7ea701768cfbdaa71d079b215f5f549a6f19783aa970133015c0bcd11942a9
3
+ size 18379784
checkpoint-500/README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: meta-llama/Llama-3.2-1B
3
  library_name: peft
4
  ---
5
 
 
1
  ---
2
+ base_model: meta-llama/Llama-3.2-3B-Instruct
3
  library_name: peft
4
  ---
5
 
checkpoint-500/adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "v_proj",
28
- "q_proj",
29
  "k_proj",
30
- "o_proj"
 
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
+ "o_proj",
 
28
  "k_proj",
29
+ "v_proj",
30
+ "q_proj"
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
checkpoint-500/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:51dd1b6188ef2fbfbd1f1930c34693669e7f1411dc718ebc28dee1741bb7c994
3
- size 6832520
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b8259114c246ec9d021c0dc592a86039fa551ea32044b58899cb2e13eac109f
3
+ size 18379784
checkpoint-500/chat_template.jinja ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}
2
+ {%- if custom_tools is defined %}
3
+ {%- set tools = custom_tools %}
4
+ {%- endif %}
5
+ {%- if not tools_in_user_message is defined %}
6
+ {%- set tools_in_user_message = true %}
7
+ {%- endif %}
8
+ {%- if not date_string is defined %}
9
+ {%- if strftime_now is defined %}
10
+ {%- set date_string = strftime_now("%d %b %Y") %}
11
+ {%- else %}
12
+ {%- set date_string = "26 Jul 2024" %}
13
+ {%- endif %}
14
+ {%- endif %}
15
+ {%- if not tools is defined %}
16
+ {%- set tools = none %}
17
+ {%- endif %}
18
+
19
+ {#- This block extracts the system message, so we can slot it into the right place. #}
20
+ {%- if messages[0]['role'] == 'system' %}
21
+ {%- set system_message = messages[0]['content']|trim %}
22
+ {%- set messages = messages[1:] %}
23
+ {%- else %}
24
+ {%- set system_message = "" %}
25
+ {%- endif %}
26
+
27
+ {#- System message #}
28
+ {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
29
+ {%- if tools is not none %}
30
+ {{- "Environment: ipython\n" }}
31
+ {%- endif %}
32
+ {{- "Cutting Knowledge Date: December 2023\n" }}
33
+ {{- "Today Date: " + date_string + "\n\n" }}
34
+ {%- if tools is not none and not tools_in_user_message %}
35
+ {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
36
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
37
+ {{- "Do not use variables.\n\n" }}
38
+ {%- for t in tools %}
39
+ {{- t | tojson(indent=4) }}
40
+ {{- "\n\n" }}
41
+ {%- endfor %}
42
+ {%- endif %}
43
+ {{- system_message }}
44
+ {{- "<|eot_id|>" }}
45
+
46
+ {#- Custom tools are passed in a user message with some extra guidance #}
47
+ {%- if tools_in_user_message and not tools is none %}
48
+ {#- Extract the first user message so we can plug it in here #}
49
+ {%- if messages | length != 0 %}
50
+ {%- set first_user_message = messages[0]['content']|trim %}
51
+ {%- set messages = messages[1:] %}
52
+ {%- else %}
53
+ {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
54
+ {%- endif %}
55
+ {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
56
+ {{- "Given the following functions, please respond with a JSON for a function call " }}
57
+ {{- "with its proper arguments that best answers the given prompt.\n\n" }}
58
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
59
+ {{- "Do not use variables.\n\n" }}
60
+ {%- for t in tools %}
61
+ {{- t | tojson(indent=4) }}
62
+ {{- "\n\n" }}
63
+ {%- endfor %}
64
+ {{- first_user_message + "<|eot_id|>"}}
65
+ {%- endif %}
66
+
67
+ {%- for message in messages %}
68
+ {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
69
+ {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
70
+ {%- elif 'tool_calls' in message %}
71
+ {%- if not message.tool_calls|length == 1 %}
72
+ {{- raise_exception("This model only supports single tool-calls at once!") }}
73
+ {%- endif %}
74
+ {%- set tool_call = message.tool_calls[0].function %}
75
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
76
+ {{- '{"name": "' + tool_call.name + '", ' }}
77
+ {{- '"parameters": ' }}
78
+ {{- tool_call.arguments | tojson }}
79
+ {{- "}" }}
80
+ {{- "<|eot_id|>" }}
81
+ {%- elif message.role == "tool" or message.role == "ipython" %}
82
+ {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
83
+ {%- if message.content is mapping or message.content is iterable %}
84
+ {{- message.content | tojson }}
85
+ {%- else %}
86
+ {{- message.content }}
87
+ {%- endif %}
88
+ {{- "<|eot_id|>" }}
89
+ {%- endif %}
90
+ {%- endfor %}
91
+ {%- if add_generation_prompt %}
92
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
93
+ {%- endif %}
checkpoint-500/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:38c9a3232ca6a60b67a25870c57d11d46276154b2d6b15b197b226df5fe7baa7
3
- size 13739130
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3317afcdcf3af4f5f4ac06a49cdec7ac76a81a4e0df552be3350fb47836d8723
3
+ size 36888186
checkpoint-500/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b5510d5675ca4cc88865f7333ed93c57de3a9c0ab6785b70b748e89d23dc14f3
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:766e7a0607dfbb8fc62a3e9fdbd70a306ec1cbc12acb22a2ca3051403cb0f501
3
  size 14244
checkpoint-500/special_tokens_map.json CHANGED
@@ -7,11 +7,11 @@
7
  "single_word": false
8
  },
9
  "eos_token": {
10
- "content": "<|end_of_text|>",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": "<|end_of_text|>"
17
  }
 
7
  "single_word": false
8
  },
9
  "eos_token": {
10
+ "content": "<|eot_id|>",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "pad_token": "<|eot_id|>"
17
  }
checkpoint-500/tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9d4fd2d4afa82d8a7dadae3490fdc20b26f06e32cec78a8dc96521b4dc79038
3
- size 17210200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c70650b4236027dc8db4abca6b918783a8ed2ee38cd69142f6dbbeb5945f876f
3
+ size 17210195
checkpoint-500/tokenizer_config.json CHANGED
@@ -2051,13 +2051,13 @@
2051
  },
2052
  "bos_token": "<|begin_of_text|>",
2053
  "clean_up_tokenization_spaces": true,
2054
- "eos_token": "<|end_of_text|>",
2055
  "extra_special_tokens": {},
2056
  "model_input_names": [
2057
  "input_ids",
2058
  "attention_mask"
2059
  ],
2060
  "model_max_length": 131072,
2061
- "pad_token": "<|end_of_text|>",
2062
  "tokenizer_class": "PreTrainedTokenizer"
2063
  }
 
2051
  },
2052
  "bos_token": "<|begin_of_text|>",
2053
  "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|eot_id|>",
2055
  "extra_special_tokens": {},
2056
  "model_input_names": [
2057
  "input_ids",
2058
  "attention_mask"
2059
  ],
2060
  "model_max_length": 131072,
2061
+ "pad_token": "<|eot_id|>",
2062
  "tokenizer_class": "PreTrainedTokenizer"
2063
  }
checkpoint-500/trainer_state.json CHANGED
@@ -11,352 +11,352 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.07407407407407407,
14
- "grad_norm": 1.9176784753799438,
15
  "learning_rate": 4.933333333333334e-05,
16
- "loss": 3.2618,
17
  "step": 10
18
  },
19
  {
20
  "epoch": 0.14814814814814814,
21
- "grad_norm": 2.166156053543091,
22
  "learning_rate": 4.8592592592592596e-05,
23
- "loss": 2.9008,
24
  "step": 20
25
  },
26
  {
27
  "epoch": 0.2222222222222222,
28
- "grad_norm": 3.130929470062256,
29
  "learning_rate": 4.7851851851851854e-05,
30
- "loss": 2.4903,
31
  "step": 30
32
  },
33
  {
34
  "epoch": 0.2962962962962963,
35
- "grad_norm": 3.0643246173858643,
36
  "learning_rate": 4.711111111111111e-05,
37
- "loss": 2.1276,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.37037037037037035,
42
- "grad_norm": 3.166916847229004,
43
  "learning_rate": 4.637037037037038e-05,
44
- "loss": 1.9145,
45
  "step": 50
46
  },
47
  {
48
  "epoch": 0.4444444444444444,
49
- "grad_norm": 3.496783494949341,
50
  "learning_rate": 4.5629629629629636e-05,
51
- "loss": 1.7413,
52
  "step": 60
53
  },
54
  {
55
  "epoch": 0.5185185185185185,
56
- "grad_norm": 2.274343490600586,
57
  "learning_rate": 4.4888888888888894e-05,
58
- "loss": 1.5862,
59
  "step": 70
60
  },
61
  {
62
  "epoch": 0.5925925925925926,
63
- "grad_norm": 2.1317834854125977,
64
  "learning_rate": 4.414814814814815e-05,
65
- "loss": 1.5853,
66
  "step": 80
67
  },
68
  {
69
  "epoch": 0.6666666666666666,
70
- "grad_norm": 2.5336570739746094,
71
  "learning_rate": 4.340740740740741e-05,
72
- "loss": 1.4492,
73
  "step": 90
74
  },
75
  {
76
  "epoch": 0.7407407407407407,
77
- "grad_norm": 2.5489163398742676,
78
  "learning_rate": 4.266666666666667e-05,
79
- "loss": 1.5561,
80
  "step": 100
81
  },
82
  {
83
  "epoch": 0.8148148148148148,
84
- "grad_norm": 2.276472568511963,
85
  "learning_rate": 4.192592592592593e-05,
86
- "loss": 1.376,
87
  "step": 110
88
  },
89
  {
90
  "epoch": 0.8888888888888888,
91
- "grad_norm": 2.9320948123931885,
92
  "learning_rate": 4.1185185185185186e-05,
93
- "loss": 1.5952,
94
  "step": 120
95
  },
96
  {
97
  "epoch": 0.9629629629629629,
98
- "grad_norm": 2.639327049255371,
99
  "learning_rate": 4.0444444444444444e-05,
100
- "loss": 1.567,
101
  "step": 130
102
  },
103
  {
104
  "epoch": 1.037037037037037,
105
- "grad_norm": 2.1957807540893555,
106
  "learning_rate": 3.97037037037037e-05,
107
- "loss": 1.4817,
108
  "step": 140
109
  },
110
  {
111
  "epoch": 1.1111111111111112,
112
- "grad_norm": 2.7867722511291504,
113
  "learning_rate": 3.896296296296296e-05,
114
- "loss": 1.4706,
115
  "step": 150
116
  },
117
  {
118
  "epoch": 1.1851851851851851,
119
- "grad_norm": 3.132254123687744,
120
  "learning_rate": 3.8222222222222226e-05,
121
- "loss": 1.5353,
122
  "step": 160
123
  },
124
  {
125
  "epoch": 1.2592592592592593,
126
- "grad_norm": 2.851921319961548,
127
  "learning_rate": 3.7481481481481484e-05,
128
- "loss": 1.4258,
129
  "step": 170
130
  },
131
  {
132
  "epoch": 1.3333333333333333,
133
- "grad_norm": 2.733062505722046,
134
  "learning_rate": 3.674074074074074e-05,
135
- "loss": 1.4788,
136
  "step": 180
137
  },
138
  {
139
  "epoch": 1.4074074074074074,
140
- "grad_norm": 2.58499813079834,
141
  "learning_rate": 3.6e-05,
142
- "loss": 1.3938,
143
  "step": 190
144
  },
145
  {
146
  "epoch": 1.4814814814814814,
147
- "grad_norm": 2.7078592777252197,
148
  "learning_rate": 3.525925925925926e-05,
149
- "loss": 1.4244,
150
  "step": 200
151
  },
152
  {
153
  "epoch": 1.5555555555555556,
154
- "grad_norm": 2.7007601261138916,
155
  "learning_rate": 3.4518518518518524e-05,
156
- "loss": 1.575,
157
  "step": 210
158
  },
159
  {
160
  "epoch": 1.6296296296296298,
161
- "grad_norm": 2.4323105812072754,
162
  "learning_rate": 3.377777777777778e-05,
163
- "loss": 1.6532,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.7037037037037037,
168
- "grad_norm": 2.4938671588897705,
169
  "learning_rate": 3.303703703703704e-05,
170
- "loss": 1.5417,
171
  "step": 230
172
  },
173
  {
174
  "epoch": 1.7777777777777777,
175
- "grad_norm": 2.872101068496704,
176
  "learning_rate": 3.22962962962963e-05,
177
- "loss": 1.3412,
178
  "step": 240
179
  },
180
  {
181
  "epoch": 1.8518518518518519,
182
- "grad_norm": 3.255509614944458,
183
  "learning_rate": 3.155555555555556e-05,
184
- "loss": 1.4949,
185
  "step": 250
186
  },
187
  {
188
  "epoch": 1.925925925925926,
189
- "grad_norm": 3.0668418407440186,
190
  "learning_rate": 3.0814814814814816e-05,
191
- "loss": 1.4992,
192
  "step": 260
193
  },
194
  {
195
  "epoch": 2.0,
196
- "grad_norm": 3.030184745788574,
197
  "learning_rate": 3.0074074074074078e-05,
198
- "loss": 1.5028,
199
  "step": 270
200
  },
201
  {
202
  "epoch": 2.074074074074074,
203
- "grad_norm": 3.6970374584198,
204
  "learning_rate": 2.9333333333333336e-05,
205
- "loss": 1.4106,
206
  "step": 280
207
  },
208
  {
209
  "epoch": 2.148148148148148,
210
- "grad_norm": 4.04591178894043,
211
  "learning_rate": 2.8592592592592594e-05,
212
- "loss": 1.3073,
213
  "step": 290
214
  },
215
  {
216
  "epoch": 2.2222222222222223,
217
- "grad_norm": 3.198578357696533,
218
  "learning_rate": 2.7851851851851853e-05,
219
- "loss": 1.4697,
220
  "step": 300
221
  },
222
  {
223
  "epoch": 2.2962962962962963,
224
- "grad_norm": 2.752206802368164,
225
  "learning_rate": 2.7111111111111114e-05,
226
- "loss": 1.5942,
227
  "step": 310
228
  },
229
  {
230
  "epoch": 2.3703703703703702,
231
- "grad_norm": 2.6222379207611084,
232
  "learning_rate": 2.6370370370370373e-05,
233
- "loss": 1.529,
234
  "step": 320
235
  },
236
  {
237
  "epoch": 2.4444444444444446,
238
- "grad_norm": 3.0837435722351074,
239
  "learning_rate": 2.562962962962963e-05,
240
- "loss": 1.3467,
241
  "step": 330
242
  },
243
  {
244
  "epoch": 2.5185185185185186,
245
- "grad_norm": 3.7321062088012695,
246
  "learning_rate": 2.488888888888889e-05,
247
- "loss": 1.4712,
248
  "step": 340
249
  },
250
  {
251
  "epoch": 2.5925925925925926,
252
- "grad_norm": 3.4725160598754883,
253
  "learning_rate": 2.414814814814815e-05,
254
- "loss": 1.366,
255
  "step": 350
256
  },
257
  {
258
  "epoch": 2.6666666666666665,
259
- "grad_norm": 3.5917716026306152,
260
  "learning_rate": 2.340740740740741e-05,
261
- "loss": 1.4706,
262
  "step": 360
263
  },
264
  {
265
  "epoch": 2.7407407407407405,
266
- "grad_norm": 2.643585205078125,
267
  "learning_rate": 2.2666666666666668e-05,
268
- "loss": 1.4946,
269
  "step": 370
270
  },
271
  {
272
  "epoch": 2.814814814814815,
273
- "grad_norm": 4.659608364105225,
274
  "learning_rate": 2.1925925925925926e-05,
275
- "loss": 1.4062,
276
  "step": 380
277
  },
278
  {
279
  "epoch": 2.888888888888889,
280
- "grad_norm": 3.32312273979187,
281
  "learning_rate": 2.1185185185185184e-05,
282
- "loss": 1.4974,
283
  "step": 390
284
  },
285
  {
286
  "epoch": 2.962962962962963,
287
- "grad_norm": 2.8320910930633545,
288
  "learning_rate": 2.0444444444444446e-05,
289
- "loss": 1.3854,
290
  "step": 400
291
  },
292
  {
293
  "epoch": 3.037037037037037,
294
- "grad_norm": 2.9114246368408203,
295
  "learning_rate": 1.9703703703703704e-05,
296
- "loss": 1.4369,
297
  "step": 410
298
  },
299
  {
300
  "epoch": 3.111111111111111,
301
- "grad_norm": 3.240769147872925,
302
  "learning_rate": 1.8962962962962963e-05,
303
- "loss": 1.343,
304
  "step": 420
305
  },
306
  {
307
  "epoch": 3.185185185185185,
308
- "grad_norm": 3.537137985229492,
309
  "learning_rate": 1.8222222222222224e-05,
310
- "loss": 1.3801,
311
  "step": 430
312
  },
313
  {
314
  "epoch": 3.259259259259259,
315
- "grad_norm": 3.054455518722534,
316
  "learning_rate": 1.7481481481481483e-05,
317
- "loss": 1.4533,
318
  "step": 440
319
  },
320
  {
321
  "epoch": 3.3333333333333335,
322
- "grad_norm": 4.251873016357422,
323
  "learning_rate": 1.674074074074074e-05,
324
- "loss": 1.4991,
325
  "step": 450
326
  },
327
  {
328
  "epoch": 3.4074074074074074,
329
- "grad_norm": 2.9473700523376465,
330
  "learning_rate": 1.6000000000000003e-05,
331
- "loss": 1.3662,
332
  "step": 460
333
  },
334
  {
335
  "epoch": 3.4814814814814814,
336
- "grad_norm": 3.284587860107422,
337
  "learning_rate": 1.5259259259259258e-05,
338
- "loss": 1.3832,
339
  "step": 470
340
  },
341
  {
342
  "epoch": 3.5555555555555554,
343
- "grad_norm": 3.0811917781829834,
344
  "learning_rate": 1.4518518518518521e-05,
345
- "loss": 1.4724,
346
  "step": 480
347
  },
348
  {
349
  "epoch": 3.6296296296296298,
350
- "grad_norm": 2.595721960067749,
351
  "learning_rate": 1.3777777777777778e-05,
352
- "loss": 1.3091,
353
  "step": 490
354
  },
355
  {
356
  "epoch": 3.7037037037037037,
357
- "grad_norm": 3.941594123840332,
358
  "learning_rate": 1.3037037037037036e-05,
359
- "loss": 1.371,
360
  "step": 500
361
  }
362
  ],
@@ -377,7 +377,7 @@
377
  "attributes": {}
378
  }
379
  },
380
- "total_flos": 2994739347456000.0,
381
  "train_batch_size": 2,
382
  "trial_name": null,
383
  "trial_params": null
 
11
  "log_history": [
12
  {
13
  "epoch": 0.07407407407407407,
14
+ "grad_norm": 1.6033002138137817,
15
  "learning_rate": 4.933333333333334e-05,
16
+ "loss": 3.0632,
17
  "step": 10
18
  },
19
  {
20
  "epoch": 0.14814814814814814,
21
+ "grad_norm": 1.6290810108184814,
22
  "learning_rate": 4.8592592592592596e-05,
23
+ "loss": 2.7368,
24
  "step": 20
25
  },
26
  {
27
  "epoch": 0.2222222222222222,
28
+ "grad_norm": 2.151897430419922,
29
  "learning_rate": 4.7851851851851854e-05,
30
+ "loss": 2.3871,
31
  "step": 30
32
  },
33
  {
34
  "epoch": 0.2962962962962963,
35
+ "grad_norm": 1.8000338077545166,
36
  "learning_rate": 4.711111111111111e-05,
37
+ "loss": 1.9196,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.37037037037037035,
42
+ "grad_norm": 1.8977078199386597,
43
  "learning_rate": 4.637037037037038e-05,
44
+ "loss": 1.3918,
45
  "step": 50
46
  },
47
  {
48
  "epoch": 0.4444444444444444,
49
+ "grad_norm": 1.895778775215149,
50
  "learning_rate": 4.5629629629629636e-05,
51
+ "loss": 1.5302,
52
  "step": 60
53
  },
54
  {
55
  "epoch": 0.5185185185185185,
56
+ "grad_norm": 2.118054151535034,
57
  "learning_rate": 4.4888888888888894e-05,
58
+ "loss": 1.5947,
59
  "step": 70
60
  },
61
  {
62
  "epoch": 0.5925925925925926,
63
+ "grad_norm": 1.488535761833191,
64
  "learning_rate": 4.414814814814815e-05,
65
+ "loss": 1.719,
66
  "step": 80
67
  },
68
  {
69
  "epoch": 0.6666666666666666,
70
+ "grad_norm": 1.6291440725326538,
71
  "learning_rate": 4.340740740740741e-05,
72
+ "loss": 1.5577,
73
  "step": 90
74
  },
75
  {
76
  "epoch": 0.7407407407407407,
77
+ "grad_norm": 1.8335853815078735,
78
  "learning_rate": 4.266666666666667e-05,
79
+ "loss": 1.5306,
80
  "step": 100
81
  },
82
  {
83
  "epoch": 0.8148148148148148,
84
+ "grad_norm": 1.6403965950012207,
85
  "learning_rate": 4.192592592592593e-05,
86
+ "loss": 1.4556,
87
  "step": 110
88
  },
89
  {
90
  "epoch": 0.8888888888888888,
91
+ "grad_norm": 2.472151279449463,
92
  "learning_rate": 4.1185185185185186e-05,
93
+ "loss": 1.3542,
94
  "step": 120
95
  },
96
  {
97
  "epoch": 0.9629629629629629,
98
+ "grad_norm": 2.03757643699646,
99
  "learning_rate": 4.0444444444444444e-05,
100
+ "loss": 1.4318,
101
  "step": 130
102
  },
103
  {
104
  "epoch": 1.037037037037037,
105
+ "grad_norm": 1.8082479238510132,
106
  "learning_rate": 3.97037037037037e-05,
107
+ "loss": 1.325,
108
  "step": 140
109
  },
110
  {
111
  "epoch": 1.1111111111111112,
112
+ "grad_norm": 1.9503273963928223,
113
  "learning_rate": 3.896296296296296e-05,
114
+ "loss": 1.4448,
115
  "step": 150
116
  },
117
  {
118
  "epoch": 1.1851851851851851,
119
+ "grad_norm": 1.9627147912979126,
120
  "learning_rate": 3.8222222222222226e-05,
121
+ "loss": 1.5878,
122
  "step": 160
123
  },
124
  {
125
  "epoch": 1.2592592592592593,
126
+ "grad_norm": 1.7511639595031738,
127
  "learning_rate": 3.7481481481481484e-05,
128
+ "loss": 1.3959,
129
  "step": 170
130
  },
131
  {
132
  "epoch": 1.3333333333333333,
133
+ "grad_norm": 2.0530567169189453,
134
  "learning_rate": 3.674074074074074e-05,
135
+ "loss": 1.3011,
136
  "step": 180
137
  },
138
  {
139
  "epoch": 1.4074074074074074,
140
+ "grad_norm": 2.0430173873901367,
141
  "learning_rate": 3.6e-05,
142
+ "loss": 1.4573,
143
  "step": 190
144
  },
145
  {
146
  "epoch": 1.4814814814814814,
147
+ "grad_norm": 2.0357518196105957,
148
  "learning_rate": 3.525925925925926e-05,
149
+ "loss": 1.1328,
150
  "step": 200
151
  },
152
  {
153
  "epoch": 1.5555555555555556,
154
+ "grad_norm": 1.7147893905639648,
155
  "learning_rate": 3.4518518518518524e-05,
156
+ "loss": 1.4789,
157
  "step": 210
158
  },
159
  {
160
  "epoch": 1.6296296296296298,
161
+ "grad_norm": 2.4516425132751465,
162
  "learning_rate": 3.377777777777778e-05,
163
+ "loss": 1.5025,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.7037037037037037,
168
+ "grad_norm": 1.9009228944778442,
169
  "learning_rate": 3.303703703703704e-05,
170
+ "loss": 1.4632,
171
  "step": 230
172
  },
173
  {
174
  "epoch": 1.7777777777777777,
175
+ "grad_norm": 2.4635581970214844,
176
  "learning_rate": 3.22962962962963e-05,
177
+ "loss": 1.3317,
178
  "step": 240
179
  },
180
  {
181
  "epoch": 1.8518518518518519,
182
+ "grad_norm": 2.166893243789673,
183
  "learning_rate": 3.155555555555556e-05,
184
+ "loss": 1.4509,
185
  "step": 250
186
  },
187
  {
188
  "epoch": 1.925925925925926,
189
+ "grad_norm": 2.0209872722625732,
190
  "learning_rate": 3.0814814814814816e-05,
191
+ "loss": 1.3454,
192
  "step": 260
193
  },
194
  {
195
  "epoch": 2.0,
196
+ "grad_norm": 2.484250545501709,
197
  "learning_rate": 3.0074074074074078e-05,
198
+ "loss": 1.469,
199
  "step": 270
200
  },
201
  {
202
  "epoch": 2.074074074074074,
203
+ "grad_norm": 2.2359848022460938,
204
  "learning_rate": 2.9333333333333336e-05,
205
+ "loss": 1.4094,
206
  "step": 280
207
  },
208
  {
209
  "epoch": 2.148148148148148,
210
+ "grad_norm": 1.8419456481933594,
211
  "learning_rate": 2.8592592592592594e-05,
212
+ "loss": 1.3593,
213
  "step": 290
214
  },
215
  {
216
  "epoch": 2.2222222222222223,
217
+ "grad_norm": 2.260558605194092,
218
  "learning_rate": 2.7851851851851853e-05,
219
+ "loss": 1.2538,
220
  "step": 300
221
  },
222
  {
223
  "epoch": 2.2962962962962963,
224
+ "grad_norm": 2.419581890106201,
225
  "learning_rate": 2.7111111111111114e-05,
226
+ "loss": 1.3069,
227
  "step": 310
228
  },
229
  {
230
  "epoch": 2.3703703703703702,
231
+ "grad_norm": 1.992509126663208,
232
  "learning_rate": 2.6370370370370373e-05,
233
+ "loss": 1.3721,
234
  "step": 320
235
  },
236
  {
237
  "epoch": 2.4444444444444446,
238
+ "grad_norm": 1.7485105991363525,
239
  "learning_rate": 2.562962962962963e-05,
240
+ "loss": 1.4116,
241
  "step": 330
242
  },
243
  {
244
  "epoch": 2.5185185185185186,
245
+ "grad_norm": 2.112185478210449,
246
  "learning_rate": 2.488888888888889e-05,
247
+ "loss": 1.4882,
248
  "step": 340
249
  },
250
  {
251
  "epoch": 2.5925925925925926,
252
+ "grad_norm": 2.6426734924316406,
253
  "learning_rate": 2.414814814814815e-05,
254
+ "loss": 1.3898,
255
  "step": 350
256
  },
257
  {
258
  "epoch": 2.6666666666666665,
259
+ "grad_norm": 2.420663833618164,
260
  "learning_rate": 2.340740740740741e-05,
261
+ "loss": 1.3122,
262
  "step": 360
263
  },
264
  {
265
  "epoch": 2.7407407407407405,
266
+ "grad_norm": 2.674475908279419,
267
  "learning_rate": 2.2666666666666668e-05,
268
+ "loss": 1.4165,
269
  "step": 370
270
  },
271
  {
272
  "epoch": 2.814814814814815,
273
+ "grad_norm": 2.850975275039673,
274
  "learning_rate": 2.1925925925925926e-05,
275
+ "loss": 1.3389,
276
  "step": 380
277
  },
278
  {
279
  "epoch": 2.888888888888889,
280
+ "grad_norm": 2.469388246536255,
281
  "learning_rate": 2.1185185185185184e-05,
282
+ "loss": 1.2991,
283
  "step": 390
284
  },
285
  {
286
  "epoch": 2.962962962962963,
287
+ "grad_norm": 2.733851194381714,
288
  "learning_rate": 2.0444444444444446e-05,
289
+ "loss": 1.259,
290
  "step": 400
291
  },
292
  {
293
  "epoch": 3.037037037037037,
294
+ "grad_norm": 1.964146375656128,
295
  "learning_rate": 1.9703703703703704e-05,
296
+ "loss": 1.5459,
297
  "step": 410
298
  },
299
  {
300
  "epoch": 3.111111111111111,
301
+ "grad_norm": 2.0667080879211426,
302
  "learning_rate": 1.8962962962962963e-05,
303
+ "loss": 1.3024,
304
  "step": 420
305
  },
306
  {
307
  "epoch": 3.185185185185185,
308
+ "grad_norm": 2.3768820762634277,
309
  "learning_rate": 1.8222222222222224e-05,
310
+ "loss": 1.4858,
311
  "step": 430
312
  },
313
  {
314
  "epoch": 3.259259259259259,
315
+ "grad_norm": 3.4706430435180664,
316
  "learning_rate": 1.7481481481481483e-05,
317
+ "loss": 1.3467,
318
  "step": 440
319
  },
320
  {
321
  "epoch": 3.3333333333333335,
322
+ "grad_norm": 2.3406922817230225,
323
  "learning_rate": 1.674074074074074e-05,
324
+ "loss": 1.3619,
325
  "step": 450
326
  },
327
  {
328
  "epoch": 3.4074074074074074,
329
+ "grad_norm": 2.3285129070281982,
330
  "learning_rate": 1.6000000000000003e-05,
331
+ "loss": 1.4078,
332
  "step": 460
333
  },
334
  {
335
  "epoch": 3.4814814814814814,
336
+ "grad_norm": 2.5264031887054443,
337
  "learning_rate": 1.5259259259259258e-05,
338
+ "loss": 1.1562,
339
  "step": 470
340
  },
341
  {
342
  "epoch": 3.5555555555555554,
343
+ "grad_norm": 2.290501594543457,
344
  "learning_rate": 1.4518518518518521e-05,
345
+ "loss": 1.3399,
346
  "step": 480
347
  },
348
  {
349
  "epoch": 3.6296296296296298,
350
+ "grad_norm": 3.063209056854248,
351
  "learning_rate": 1.3777777777777778e-05,
352
+ "loss": 1.1793,
353
  "step": 490
354
  },
355
  {
356
  "epoch": 3.7037037037037037,
357
+ "grad_norm": 2.8260083198547363,
358
  "learning_rate": 1.3037037037037036e-05,
359
+ "loss": 1.4168,
360
  "step": 500
361
  }
362
  ],
 
377
  "attributes": {}
378
  }
379
  },
380
+ "total_flos": 8673284849664000.0,
381
  "train_batch_size": 2,
382
  "trial_name": null,
383
  "trial_params": null
checkpoint-500/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e97160594c45d89ea3fd9c68265c308064022978d7a1e8dc093e9bed35cf1cf7
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c06b70c6d22e534aee54e60ea3091f1eeba55994a544d47464b4a805ef2ab30e
3
  size 5304
checkpoint-675/README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: meta-llama/Llama-3.2-1B
3
  library_name: peft
4
  ---
5
 
 
1
  ---
2
+ base_model: meta-llama/Llama-3.2-3B-Instruct
3
  library_name: peft
4
  ---
5
 
checkpoint-675/adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "v_proj",
28
- "q_proj",
29
  "k_proj",
30
- "o_proj"
 
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
+ "o_proj",
 
28
  "k_proj",
29
+ "v_proj",
30
+ "q_proj"
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
checkpoint-675/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85a94d974409456e3c95935ba3868ea2a1ce6587e7ca88a8214846c9ee0130dd
3
- size 6832520
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc7ea701768cfbdaa71d079b215f5f549a6f19783aa970133015c0bcd11942a9
3
+ size 18379784
checkpoint-675/chat_template.jinja ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}
2
+ {%- if custom_tools is defined %}
3
+ {%- set tools = custom_tools %}
4
+ {%- endif %}
5
+ {%- if not tools_in_user_message is defined %}
6
+ {%- set tools_in_user_message = true %}
7
+ {%- endif %}
8
+ {%- if not date_string is defined %}
9
+ {%- if strftime_now is defined %}
10
+ {%- set date_string = strftime_now("%d %b %Y") %}
11
+ {%- else %}
12
+ {%- set date_string = "26 Jul 2024" %}
13
+ {%- endif %}
14
+ {%- endif %}
15
+ {%- if not tools is defined %}
16
+ {%- set tools = none %}
17
+ {%- endif %}
18
+
19
+ {#- This block extracts the system message, so we can slot it into the right place. #}
20
+ {%- if messages[0]['role'] == 'system' %}
21
+ {%- set system_message = messages[0]['content']|trim %}
22
+ {%- set messages = messages[1:] %}
23
+ {%- else %}
24
+ {%- set system_message = "" %}
25
+ {%- endif %}
26
+
27
+ {#- System message #}
28
+ {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
29
+ {%- if tools is not none %}
30
+ {{- "Environment: ipython\n" }}
31
+ {%- endif %}
32
+ {{- "Cutting Knowledge Date: December 2023\n" }}
33
+ {{- "Today Date: " + date_string + "\n\n" }}
34
+ {%- if tools is not none and not tools_in_user_message %}
35
+ {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
36
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
37
+ {{- "Do not use variables.\n\n" }}
38
+ {%- for t in tools %}
39
+ {{- t | tojson(indent=4) }}
40
+ {{- "\n\n" }}
41
+ {%- endfor %}
42
+ {%- endif %}
43
+ {{- system_message }}
44
+ {{- "<|eot_id|>" }}
45
+
46
+ {#- Custom tools are passed in a user message with some extra guidance #}
47
+ {%- if tools_in_user_message and not tools is none %}
48
+ {#- Extract the first user message so we can plug it in here #}
49
+ {%- if messages | length != 0 %}
50
+ {%- set first_user_message = messages[0]['content']|trim %}
51
+ {%- set messages = messages[1:] %}
52
+ {%- else %}
53
+ {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
54
+ {%- endif %}
55
+ {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
56
+ {{- "Given the following functions, please respond with a JSON for a function call " }}
57
+ {{- "with its proper arguments that best answers the given prompt.\n\n" }}
58
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
59
+ {{- "Do not use variables.\n\n" }}
60
+ {%- for t in tools %}
61
+ {{- t | tojson(indent=4) }}
62
+ {{- "\n\n" }}
63
+ {%- endfor %}
64
+ {{- first_user_message + "<|eot_id|>"}}
65
+ {%- endif %}
66
+
67
+ {%- for message in messages %}
68
+ {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
69
+ {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
70
+ {%- elif 'tool_calls' in message %}
71
+ {%- if not message.tool_calls|length == 1 %}
72
+ {{- raise_exception("This model only supports single tool-calls at once!") }}
73
+ {%- endif %}
74
+ {%- set tool_call = message.tool_calls[0].function %}
75
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
76
+ {{- '{"name": "' + tool_call.name + '", ' }}
77
+ {{- '"parameters": ' }}
78
+ {{- tool_call.arguments | tojson }}
79
+ {{- "}" }}
80
+ {{- "<|eot_id|>" }}
81
+ {%- elif message.role == "tool" or message.role == "ipython" %}
82
+ {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
83
+ {%- if message.content is mapping or message.content is iterable %}
84
+ {{- message.content | tojson }}
85
+ {%- else %}
86
+ {{- message.content }}
87
+ {%- endif %}
88
+ {{- "<|eot_id|>" }}
89
+ {%- endif %}
90
+ {%- endfor %}
91
+ {%- if add_generation_prompt %}
92
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
93
+ {%- endif %}
checkpoint-675/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e16eb51edd416344ec01ae08161f75e5b39c6a771c90092b2efd6bfbe216820b
3
- size 13739130
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:575fbe3b0aeb9ea62a324077fb9c3c2cbe9882ac13f98d5feb1e3150f6354d2b
3
+ size 36888186
checkpoint-675/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6f2ea9c0c0d5c060e3f0c36ca552127cfbc3cb0e8231b97b11065f63c83d513f
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11f806a90936250186b7351d45b954a2dacb8a2cb0336a0049a5107f2a56eceb
3
  size 14244
checkpoint-675/special_tokens_map.json CHANGED
@@ -7,11 +7,11 @@
7
  "single_word": false
8
  },
9
  "eos_token": {
10
- "content": "<|end_of_text|>",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": "<|end_of_text|>"
17
  }
 
7
  "single_word": false
8
  },
9
  "eos_token": {
10
+ "content": "<|eot_id|>",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "pad_token": "<|eot_id|>"
17
  }
checkpoint-675/tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9d4fd2d4afa82d8a7dadae3490fdc20b26f06e32cec78a8dc96521b4dc79038
3
- size 17210200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c70650b4236027dc8db4abca6b918783a8ed2ee38cd69142f6dbbeb5945f876f
3
+ size 17210195
checkpoint-675/tokenizer_config.json CHANGED
@@ -2051,13 +2051,13 @@
2051
  },
2052
  "bos_token": "<|begin_of_text|>",
2053
  "clean_up_tokenization_spaces": true,
2054
- "eos_token": "<|end_of_text|>",
2055
  "extra_special_tokens": {},
2056
  "model_input_names": [
2057
  "input_ids",
2058
  "attention_mask"
2059
  ],
2060
  "model_max_length": 131072,
2061
- "pad_token": "<|end_of_text|>",
2062
  "tokenizer_class": "PreTrainedTokenizer"
2063
  }
 
2051
  },
2052
  "bos_token": "<|begin_of_text|>",
2053
  "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|eot_id|>",
2055
  "extra_special_tokens": {},
2056
  "model_input_names": [
2057
  "input_ids",
2058
  "attention_mask"
2059
  ],
2060
  "model_max_length": 131072,
2061
+ "pad_token": "<|eot_id|>",
2062
  "tokenizer_class": "PreTrainedTokenizer"
2063
  }
checkpoint-675/trainer_state.json CHANGED
@@ -11,471 +11,471 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.07407407407407407,
14
- "grad_norm": 1.9176784753799438,
15
  "learning_rate": 4.933333333333334e-05,
16
- "loss": 3.2618,
17
  "step": 10
18
  },
19
  {
20
  "epoch": 0.14814814814814814,
21
- "grad_norm": 2.166156053543091,
22
  "learning_rate": 4.8592592592592596e-05,
23
- "loss": 2.9008,
24
  "step": 20
25
  },
26
  {
27
  "epoch": 0.2222222222222222,
28
- "grad_norm": 3.130929470062256,
29
  "learning_rate": 4.7851851851851854e-05,
30
- "loss": 2.4903,
31
  "step": 30
32
  },
33
  {
34
  "epoch": 0.2962962962962963,
35
- "grad_norm": 3.0643246173858643,
36
  "learning_rate": 4.711111111111111e-05,
37
- "loss": 2.1276,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.37037037037037035,
42
- "grad_norm": 3.166916847229004,
43
  "learning_rate": 4.637037037037038e-05,
44
- "loss": 1.9145,
45
  "step": 50
46
  },
47
  {
48
  "epoch": 0.4444444444444444,
49
- "grad_norm": 3.496783494949341,
50
  "learning_rate": 4.5629629629629636e-05,
51
- "loss": 1.7413,
52
  "step": 60
53
  },
54
  {
55
  "epoch": 0.5185185185185185,
56
- "grad_norm": 2.274343490600586,
57
  "learning_rate": 4.4888888888888894e-05,
58
- "loss": 1.5862,
59
  "step": 70
60
  },
61
  {
62
  "epoch": 0.5925925925925926,
63
- "grad_norm": 2.1317834854125977,
64
  "learning_rate": 4.414814814814815e-05,
65
- "loss": 1.5853,
66
  "step": 80
67
  },
68
  {
69
  "epoch": 0.6666666666666666,
70
- "grad_norm": 2.5336570739746094,
71
  "learning_rate": 4.340740740740741e-05,
72
- "loss": 1.4492,
73
  "step": 90
74
  },
75
  {
76
  "epoch": 0.7407407407407407,
77
- "grad_norm": 2.5489163398742676,
78
  "learning_rate": 4.266666666666667e-05,
79
- "loss": 1.5561,
80
  "step": 100
81
  },
82
  {
83
  "epoch": 0.8148148148148148,
84
- "grad_norm": 2.276472568511963,
85
  "learning_rate": 4.192592592592593e-05,
86
- "loss": 1.376,
87
  "step": 110
88
  },
89
  {
90
  "epoch": 0.8888888888888888,
91
- "grad_norm": 2.9320948123931885,
92
  "learning_rate": 4.1185185185185186e-05,
93
- "loss": 1.5952,
94
  "step": 120
95
  },
96
  {
97
  "epoch": 0.9629629629629629,
98
- "grad_norm": 2.639327049255371,
99
  "learning_rate": 4.0444444444444444e-05,
100
- "loss": 1.567,
101
  "step": 130
102
  },
103
  {
104
  "epoch": 1.037037037037037,
105
- "grad_norm": 2.1957807540893555,
106
  "learning_rate": 3.97037037037037e-05,
107
- "loss": 1.4817,
108
  "step": 140
109
  },
110
  {
111
  "epoch": 1.1111111111111112,
112
- "grad_norm": 2.7867722511291504,
113
  "learning_rate": 3.896296296296296e-05,
114
- "loss": 1.4706,
115
  "step": 150
116
  },
117
  {
118
  "epoch": 1.1851851851851851,
119
- "grad_norm": 3.132254123687744,
120
  "learning_rate": 3.8222222222222226e-05,
121
- "loss": 1.5353,
122
  "step": 160
123
  },
124
  {
125
  "epoch": 1.2592592592592593,
126
- "grad_norm": 2.851921319961548,
127
  "learning_rate": 3.7481481481481484e-05,
128
- "loss": 1.4258,
129
  "step": 170
130
  },
131
  {
132
  "epoch": 1.3333333333333333,
133
- "grad_norm": 2.733062505722046,
134
  "learning_rate": 3.674074074074074e-05,
135
- "loss": 1.4788,
136
  "step": 180
137
  },
138
  {
139
  "epoch": 1.4074074074074074,
140
- "grad_norm": 2.58499813079834,
141
  "learning_rate": 3.6e-05,
142
- "loss": 1.3938,
143
  "step": 190
144
  },
145
  {
146
  "epoch": 1.4814814814814814,
147
- "grad_norm": 2.7078592777252197,
148
  "learning_rate": 3.525925925925926e-05,
149
- "loss": 1.4244,
150
  "step": 200
151
  },
152
  {
153
  "epoch": 1.5555555555555556,
154
- "grad_norm": 2.7007601261138916,
155
  "learning_rate": 3.4518518518518524e-05,
156
- "loss": 1.575,
157
  "step": 210
158
  },
159
  {
160
  "epoch": 1.6296296296296298,
161
- "grad_norm": 2.4323105812072754,
162
  "learning_rate": 3.377777777777778e-05,
163
- "loss": 1.6532,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.7037037037037037,
168
- "grad_norm": 2.4938671588897705,
169
  "learning_rate": 3.303703703703704e-05,
170
- "loss": 1.5417,
171
  "step": 230
172
  },
173
  {
174
  "epoch": 1.7777777777777777,
175
- "grad_norm": 2.872101068496704,
176
  "learning_rate": 3.22962962962963e-05,
177
- "loss": 1.3412,
178
  "step": 240
179
  },
180
  {
181
  "epoch": 1.8518518518518519,
182
- "grad_norm": 3.255509614944458,
183
  "learning_rate": 3.155555555555556e-05,
184
- "loss": 1.4949,
185
  "step": 250
186
  },
187
  {
188
  "epoch": 1.925925925925926,
189
- "grad_norm": 3.0668418407440186,
190
  "learning_rate": 3.0814814814814816e-05,
191
- "loss": 1.4992,
192
  "step": 260
193
  },
194
  {
195
  "epoch": 2.0,
196
- "grad_norm": 3.030184745788574,
197
  "learning_rate": 3.0074074074074078e-05,
198
- "loss": 1.5028,
199
  "step": 270
200
  },
201
  {
202
  "epoch": 2.074074074074074,
203
- "grad_norm": 3.6970374584198,
204
  "learning_rate": 2.9333333333333336e-05,
205
- "loss": 1.4106,
206
  "step": 280
207
  },
208
  {
209
  "epoch": 2.148148148148148,
210
- "grad_norm": 4.04591178894043,
211
  "learning_rate": 2.8592592592592594e-05,
212
- "loss": 1.3073,
213
  "step": 290
214
  },
215
  {
216
  "epoch": 2.2222222222222223,
217
- "grad_norm": 3.198578357696533,
218
  "learning_rate": 2.7851851851851853e-05,
219
- "loss": 1.4697,
220
  "step": 300
221
  },
222
  {
223
  "epoch": 2.2962962962962963,
224
- "grad_norm": 2.752206802368164,
225
  "learning_rate": 2.7111111111111114e-05,
226
- "loss": 1.5942,
227
  "step": 310
228
  },
229
  {
230
  "epoch": 2.3703703703703702,
231
- "grad_norm": 2.6222379207611084,
232
  "learning_rate": 2.6370370370370373e-05,
233
- "loss": 1.529,
234
  "step": 320
235
  },
236
  {
237
  "epoch": 2.4444444444444446,
238
- "grad_norm": 3.0837435722351074,
239
  "learning_rate": 2.562962962962963e-05,
240
- "loss": 1.3467,
241
  "step": 330
242
  },
243
  {
244
  "epoch": 2.5185185185185186,
245
- "grad_norm": 3.7321062088012695,
246
  "learning_rate": 2.488888888888889e-05,
247
- "loss": 1.4712,
248
  "step": 340
249
  },
250
  {
251
  "epoch": 2.5925925925925926,
252
- "grad_norm": 3.4725160598754883,
253
  "learning_rate": 2.414814814814815e-05,
254
- "loss": 1.366,
255
  "step": 350
256
  },
257
  {
258
  "epoch": 2.6666666666666665,
259
- "grad_norm": 3.5917716026306152,
260
  "learning_rate": 2.340740740740741e-05,
261
- "loss": 1.4706,
262
  "step": 360
263
  },
264
  {
265
  "epoch": 2.7407407407407405,
266
- "grad_norm": 2.643585205078125,
267
  "learning_rate": 2.2666666666666668e-05,
268
- "loss": 1.4946,
269
  "step": 370
270
  },
271
  {
272
  "epoch": 2.814814814814815,
273
- "grad_norm": 4.659608364105225,
274
  "learning_rate": 2.1925925925925926e-05,
275
- "loss": 1.4062,
276
  "step": 380
277
  },
278
  {
279
  "epoch": 2.888888888888889,
280
- "grad_norm": 3.32312273979187,
281
  "learning_rate": 2.1185185185185184e-05,
282
- "loss": 1.4974,
283
  "step": 390
284
  },
285
  {
286
  "epoch": 2.962962962962963,
287
- "grad_norm": 2.8320910930633545,
288
  "learning_rate": 2.0444444444444446e-05,
289
- "loss": 1.3854,
290
  "step": 400
291
  },
292
  {
293
  "epoch": 3.037037037037037,
294
- "grad_norm": 2.9114246368408203,
295
  "learning_rate": 1.9703703703703704e-05,
296
- "loss": 1.4369,
297
  "step": 410
298
  },
299
  {
300
  "epoch": 3.111111111111111,
301
- "grad_norm": 3.240769147872925,
302
  "learning_rate": 1.8962962962962963e-05,
303
- "loss": 1.343,
304
  "step": 420
305
  },
306
  {
307
  "epoch": 3.185185185185185,
308
- "grad_norm": 3.537137985229492,
309
  "learning_rate": 1.8222222222222224e-05,
310
- "loss": 1.3801,
311
  "step": 430
312
  },
313
  {
314
  "epoch": 3.259259259259259,
315
- "grad_norm": 3.054455518722534,
316
  "learning_rate": 1.7481481481481483e-05,
317
- "loss": 1.4533,
318
  "step": 440
319
  },
320
  {
321
  "epoch": 3.3333333333333335,
322
- "grad_norm": 4.251873016357422,
323
  "learning_rate": 1.674074074074074e-05,
324
- "loss": 1.4991,
325
  "step": 450
326
  },
327
  {
328
  "epoch": 3.4074074074074074,
329
- "grad_norm": 2.9473700523376465,
330
  "learning_rate": 1.6000000000000003e-05,
331
- "loss": 1.3662,
332
  "step": 460
333
  },
334
  {
335
  "epoch": 3.4814814814814814,
336
- "grad_norm": 3.284587860107422,
337
  "learning_rate": 1.5259259259259258e-05,
338
- "loss": 1.3832,
339
  "step": 470
340
  },
341
  {
342
  "epoch": 3.5555555555555554,
343
- "grad_norm": 3.0811917781829834,
344
  "learning_rate": 1.4518518518518521e-05,
345
- "loss": 1.4724,
346
  "step": 480
347
  },
348
  {
349
  "epoch": 3.6296296296296298,
350
- "grad_norm": 2.595721960067749,
351
  "learning_rate": 1.3777777777777778e-05,
352
- "loss": 1.3091,
353
  "step": 490
354
  },
355
  {
356
  "epoch": 3.7037037037037037,
357
- "grad_norm": 3.941594123840332,
358
  "learning_rate": 1.3037037037037036e-05,
359
- "loss": 1.371,
360
  "step": 500
361
  },
362
  {
363
  "epoch": 3.7777777777777777,
364
- "grad_norm": 3.5405843257904053,
365
  "learning_rate": 1.2296296296296298e-05,
366
- "loss": 1.3644,
367
  "step": 510
368
  },
369
  {
370
  "epoch": 3.851851851851852,
371
- "grad_norm": 2.9564130306243896,
372
  "learning_rate": 1.1555555555555556e-05,
373
- "loss": 1.458,
374
  "step": 520
375
  },
376
  {
377
  "epoch": 3.925925925925926,
378
- "grad_norm": 2.8802552223205566,
379
  "learning_rate": 1.0814814814814814e-05,
380
- "loss": 1.4408,
381
  "step": 530
382
  },
383
  {
384
  "epoch": 4.0,
385
- "grad_norm": 3.0077877044677734,
386
  "learning_rate": 1.0074074074074074e-05,
387
- "loss": 1.4497,
388
  "step": 540
389
  },
390
  {
391
  "epoch": 4.074074074074074,
392
- "grad_norm": 3.43784761428833,
393
  "learning_rate": 9.333333333333334e-06,
394
- "loss": 1.4439,
395
  "step": 550
396
  },
397
  {
398
  "epoch": 4.148148148148148,
399
- "grad_norm": 3.5418014526367188,
400
  "learning_rate": 8.592592592592593e-06,
401
- "loss": 1.4832,
402
  "step": 560
403
  },
404
  {
405
  "epoch": 4.222222222222222,
406
- "grad_norm": 3.1893768310546875,
407
  "learning_rate": 7.851851851851853e-06,
408
- "loss": 1.3177,
409
  "step": 570
410
  },
411
  {
412
  "epoch": 4.296296296296296,
413
- "grad_norm": 3.522493839263916,
414
  "learning_rate": 7.111111111111112e-06,
415
- "loss": 1.2925,
416
  "step": 580
417
  },
418
  {
419
  "epoch": 4.37037037037037,
420
- "grad_norm": 3.473977565765381,
421
  "learning_rate": 6.370370370370371e-06,
422
- "loss": 1.4198,
423
  "step": 590
424
  },
425
  {
426
  "epoch": 4.444444444444445,
427
- "grad_norm": 4.043973445892334,
428
  "learning_rate": 5.62962962962963e-06,
429
- "loss": 1.3265,
430
  "step": 600
431
  },
432
  {
433
  "epoch": 4.518518518518518,
434
- "grad_norm": 4.024613857269287,
435
  "learning_rate": 4.888888888888889e-06,
436
- "loss": 1.3528,
437
  "step": 610
438
  },
439
  {
440
  "epoch": 4.592592592592593,
441
- "grad_norm": 3.266040802001953,
442
  "learning_rate": 4.1481481481481485e-06,
443
- "loss": 1.3553,
444
  "step": 620
445
  },
446
  {
447
  "epoch": 4.666666666666667,
448
- "grad_norm": 5.175985813140869,
449
  "learning_rate": 3.4074074074074077e-06,
450
- "loss": 1.332,
451
  "step": 630
452
  },
453
  {
454
  "epoch": 4.7407407407407405,
455
- "grad_norm": 3.355964422225952,
456
  "learning_rate": 2.666666666666667e-06,
457
- "loss": 1.5673,
458
  "step": 640
459
  },
460
  {
461
  "epoch": 4.814814814814815,
462
- "grad_norm": 3.170093297958374,
463
  "learning_rate": 1.925925925925926e-06,
464
- "loss": 1.4978,
465
  "step": 650
466
  },
467
  {
468
  "epoch": 4.888888888888889,
469
- "grad_norm": 3.5066723823547363,
470
  "learning_rate": 1.1851851851851852e-06,
471
- "loss": 1.3474,
472
  "step": 660
473
  },
474
  {
475
  "epoch": 4.962962962962963,
476
- "grad_norm": 3.7922685146331787,
477
  "learning_rate": 4.444444444444445e-07,
478
- "loss": 1.3518,
479
  "step": 670
480
  }
481
  ],
@@ -496,7 +496,7 @@
496
  "attributes": {}
497
  }
498
  },
499
- "total_flos": 4042898119065600.0,
500
  "train_batch_size": 2,
501
  "trial_name": null,
502
  "trial_params": null
 
11
  "log_history": [
12
  {
13
  "epoch": 0.07407407407407407,
14
+ "grad_norm": 1.6033002138137817,
15
  "learning_rate": 4.933333333333334e-05,
16
+ "loss": 3.0632,
17
  "step": 10
18
  },
19
  {
20
  "epoch": 0.14814814814814814,
21
+ "grad_norm": 1.6290810108184814,
22
  "learning_rate": 4.8592592592592596e-05,
23
+ "loss": 2.7368,
24
  "step": 20
25
  },
26
  {
27
  "epoch": 0.2222222222222222,
28
+ "grad_norm": 2.151897430419922,
29
  "learning_rate": 4.7851851851851854e-05,
30
+ "loss": 2.3871,
31
  "step": 30
32
  },
33
  {
34
  "epoch": 0.2962962962962963,
35
+ "grad_norm": 1.8000338077545166,
36
  "learning_rate": 4.711111111111111e-05,
37
+ "loss": 1.9196,
38
  "step": 40
39
  },
40
  {
41
  "epoch": 0.37037037037037035,
42
+ "grad_norm": 1.8977078199386597,
43
  "learning_rate": 4.637037037037038e-05,
44
+ "loss": 1.3918,
45
  "step": 50
46
  },
47
  {
48
  "epoch": 0.4444444444444444,
49
+ "grad_norm": 1.895778775215149,
50
  "learning_rate": 4.5629629629629636e-05,
51
+ "loss": 1.5302,
52
  "step": 60
53
  },
54
  {
55
  "epoch": 0.5185185185185185,
56
+ "grad_norm": 2.118054151535034,
57
  "learning_rate": 4.4888888888888894e-05,
58
+ "loss": 1.5947,
59
  "step": 70
60
  },
61
  {
62
  "epoch": 0.5925925925925926,
63
+ "grad_norm": 1.488535761833191,
64
  "learning_rate": 4.414814814814815e-05,
65
+ "loss": 1.719,
66
  "step": 80
67
  },
68
  {
69
  "epoch": 0.6666666666666666,
70
+ "grad_norm": 1.6291440725326538,
71
  "learning_rate": 4.340740740740741e-05,
72
+ "loss": 1.5577,
73
  "step": 90
74
  },
75
  {
76
  "epoch": 0.7407407407407407,
77
+ "grad_norm": 1.8335853815078735,
78
  "learning_rate": 4.266666666666667e-05,
79
+ "loss": 1.5306,
80
  "step": 100
81
  },
82
  {
83
  "epoch": 0.8148148148148148,
84
+ "grad_norm": 1.6403965950012207,
85
  "learning_rate": 4.192592592592593e-05,
86
+ "loss": 1.4556,
87
  "step": 110
88
  },
89
  {
90
  "epoch": 0.8888888888888888,
91
+ "grad_norm": 2.472151279449463,
92
  "learning_rate": 4.1185185185185186e-05,
93
+ "loss": 1.3542,
94
  "step": 120
95
  },
96
  {
97
  "epoch": 0.9629629629629629,
98
+ "grad_norm": 2.03757643699646,
99
  "learning_rate": 4.0444444444444444e-05,
100
+ "loss": 1.4318,
101
  "step": 130
102
  },
103
  {
104
  "epoch": 1.037037037037037,
105
+ "grad_norm": 1.8082479238510132,
106
  "learning_rate": 3.97037037037037e-05,
107
+ "loss": 1.325,
108
  "step": 140
109
  },
110
  {
111
  "epoch": 1.1111111111111112,
112
+ "grad_norm": 1.9503273963928223,
113
  "learning_rate": 3.896296296296296e-05,
114
+ "loss": 1.4448,
115
  "step": 150
116
  },
117
  {
118
  "epoch": 1.1851851851851851,
119
+ "grad_norm": 1.9627147912979126,
120
  "learning_rate": 3.8222222222222226e-05,
121
+ "loss": 1.5878,
122
  "step": 160
123
  },
124
  {
125
  "epoch": 1.2592592592592593,
126
+ "grad_norm": 1.7511639595031738,
127
  "learning_rate": 3.7481481481481484e-05,
128
+ "loss": 1.3959,
129
  "step": 170
130
  },
131
  {
132
  "epoch": 1.3333333333333333,
133
+ "grad_norm": 2.0530567169189453,
134
  "learning_rate": 3.674074074074074e-05,
135
+ "loss": 1.3011,
136
  "step": 180
137
  },
138
  {
139
  "epoch": 1.4074074074074074,
140
+ "grad_norm": 2.0430173873901367,
141
  "learning_rate": 3.6e-05,
142
+ "loss": 1.4573,
143
  "step": 190
144
  },
145
  {
146
  "epoch": 1.4814814814814814,
147
+ "grad_norm": 2.0357518196105957,
148
  "learning_rate": 3.525925925925926e-05,
149
+ "loss": 1.1328,
150
  "step": 200
151
  },
152
  {
153
  "epoch": 1.5555555555555556,
154
+ "grad_norm": 1.7147893905639648,
155
  "learning_rate": 3.4518518518518524e-05,
156
+ "loss": 1.4789,
157
  "step": 210
158
  },
159
  {
160
  "epoch": 1.6296296296296298,
161
+ "grad_norm": 2.4516425132751465,
162
  "learning_rate": 3.377777777777778e-05,
163
+ "loss": 1.5025,
164
  "step": 220
165
  },
166
  {
167
  "epoch": 1.7037037037037037,
168
+ "grad_norm": 1.9009228944778442,
169
  "learning_rate": 3.303703703703704e-05,
170
+ "loss": 1.4632,
171
  "step": 230
172
  },
173
  {
174
  "epoch": 1.7777777777777777,
175
+ "grad_norm": 2.4635581970214844,
176
  "learning_rate": 3.22962962962963e-05,
177
+ "loss": 1.3317,
178
  "step": 240
179
  },
180
  {
181
  "epoch": 1.8518518518518519,
182
+ "grad_norm": 2.166893243789673,
183
  "learning_rate": 3.155555555555556e-05,
184
+ "loss": 1.4509,
185
  "step": 250
186
  },
187
  {
188
  "epoch": 1.925925925925926,
189
+ "grad_norm": 2.0209872722625732,
190
  "learning_rate": 3.0814814814814816e-05,
191
+ "loss": 1.3454,
192
  "step": 260
193
  },
194
  {
195
  "epoch": 2.0,
196
+ "grad_norm": 2.484250545501709,
197
  "learning_rate": 3.0074074074074078e-05,
198
+ "loss": 1.469,
199
  "step": 270
200
  },
201
  {
202
  "epoch": 2.074074074074074,
203
+ "grad_norm": 2.2359848022460938,
204
  "learning_rate": 2.9333333333333336e-05,
205
+ "loss": 1.4094,
206
  "step": 280
207
  },
208
  {
209
  "epoch": 2.148148148148148,
210
+ "grad_norm": 1.8419456481933594,
211
  "learning_rate": 2.8592592592592594e-05,
212
+ "loss": 1.3593,
213
  "step": 290
214
  },
215
  {
216
  "epoch": 2.2222222222222223,
217
+ "grad_norm": 2.260558605194092,
218
  "learning_rate": 2.7851851851851853e-05,
219
+ "loss": 1.2538,
220
  "step": 300
221
  },
222
  {
223
  "epoch": 2.2962962962962963,
224
+ "grad_norm": 2.419581890106201,
225
  "learning_rate": 2.7111111111111114e-05,
226
+ "loss": 1.3069,
227
  "step": 310
228
  },
229
  {
230
  "epoch": 2.3703703703703702,
231
+ "grad_norm": 1.992509126663208,
232
  "learning_rate": 2.6370370370370373e-05,
233
+ "loss": 1.3721,
234
  "step": 320
235
  },
236
  {
237
  "epoch": 2.4444444444444446,
238
+ "grad_norm": 1.7485105991363525,
239
  "learning_rate": 2.562962962962963e-05,
240
+ "loss": 1.4116,
241
  "step": 330
242
  },
243
  {
244
  "epoch": 2.5185185185185186,
245
+ "grad_norm": 2.112185478210449,
246
  "learning_rate": 2.488888888888889e-05,
247
+ "loss": 1.4882,
248
  "step": 340
249
  },
250
  {
251
  "epoch": 2.5925925925925926,
252
+ "grad_norm": 2.6426734924316406,
253
  "learning_rate": 2.414814814814815e-05,
254
+ "loss": 1.3898,
255
  "step": 350
256
  },
257
  {
258
  "epoch": 2.6666666666666665,
259
+ "grad_norm": 2.420663833618164,
260
  "learning_rate": 2.340740740740741e-05,
261
+ "loss": 1.3122,
262
  "step": 360
263
  },
264
  {
265
  "epoch": 2.7407407407407405,
266
+ "grad_norm": 2.674475908279419,
267
  "learning_rate": 2.2666666666666668e-05,
268
+ "loss": 1.4165,
269
  "step": 370
270
  },
271
  {
272
  "epoch": 2.814814814814815,
273
+ "grad_norm": 2.850975275039673,
274
  "learning_rate": 2.1925925925925926e-05,
275
+ "loss": 1.3389,
276
  "step": 380
277
  },
278
  {
279
  "epoch": 2.888888888888889,
280
+ "grad_norm": 2.469388246536255,
281
  "learning_rate": 2.1185185185185184e-05,
282
+ "loss": 1.2991,
283
  "step": 390
284
  },
285
  {
286
  "epoch": 2.962962962962963,
287
+ "grad_norm": 2.733851194381714,
288
  "learning_rate": 2.0444444444444446e-05,
289
+ "loss": 1.259,
290
  "step": 400
291
  },
292
  {
293
  "epoch": 3.037037037037037,
294
+ "grad_norm": 1.964146375656128,
295
  "learning_rate": 1.9703703703703704e-05,
296
+ "loss": 1.5459,
297
  "step": 410
298
  },
299
  {
300
  "epoch": 3.111111111111111,
301
+ "grad_norm": 2.0667080879211426,
302
  "learning_rate": 1.8962962962962963e-05,
303
+ "loss": 1.3024,
304
  "step": 420
305
  },
306
  {
307
  "epoch": 3.185185185185185,
308
+ "grad_norm": 2.3768820762634277,
309
  "learning_rate": 1.8222222222222224e-05,
310
+ "loss": 1.4858,
311
  "step": 430
312
  },
313
  {
314
  "epoch": 3.259259259259259,
315
+ "grad_norm": 3.4706430435180664,
316
  "learning_rate": 1.7481481481481483e-05,
317
+ "loss": 1.3467,
318
  "step": 440
319
  },
320
  {
321
  "epoch": 3.3333333333333335,
322
+ "grad_norm": 2.3406922817230225,
323
  "learning_rate": 1.674074074074074e-05,
324
+ "loss": 1.3619,
325
  "step": 450
326
  },
327
  {
328
  "epoch": 3.4074074074074074,
329
+ "grad_norm": 2.3285129070281982,
330
  "learning_rate": 1.6000000000000003e-05,
331
+ "loss": 1.4078,
332
  "step": 460
333
  },
334
  {
335
  "epoch": 3.4814814814814814,
336
+ "grad_norm": 2.5264031887054443,
337
  "learning_rate": 1.5259259259259258e-05,
338
+ "loss": 1.1562,
339
  "step": 470
340
  },
341
  {
342
  "epoch": 3.5555555555555554,
343
+ "grad_norm": 2.290501594543457,
344
  "learning_rate": 1.4518518518518521e-05,
345
+ "loss": 1.3399,
346
  "step": 480
347
  },
348
  {
349
  "epoch": 3.6296296296296298,
350
+ "grad_norm": 3.063209056854248,
351
  "learning_rate": 1.3777777777777778e-05,
352
+ "loss": 1.1793,
353
  "step": 490
354
  },
355
  {
356
  "epoch": 3.7037037037037037,
357
+ "grad_norm": 2.8260083198547363,
358
  "learning_rate": 1.3037037037037036e-05,
359
+ "loss": 1.4168,
360
  "step": 500
361
  },
362
  {
363
  "epoch": 3.7777777777777777,
364
+ "grad_norm": 2.5373244285583496,
365
  "learning_rate": 1.2296296296296298e-05,
366
+ "loss": 1.2364,
367
  "step": 510
368
  },
369
  {
370
  "epoch": 3.851851851851852,
371
+ "grad_norm": 2.6455769538879395,
372
  "learning_rate": 1.1555555555555556e-05,
373
+ "loss": 1.278,
374
  "step": 520
375
  },
376
  {
377
  "epoch": 3.925925925925926,
378
+ "grad_norm": 2.7349331378936768,
379
  "learning_rate": 1.0814814814814814e-05,
380
+ "loss": 1.3113,
381
  "step": 530
382
  },
383
  {
384
  "epoch": 4.0,
385
+ "grad_norm": 2.5275073051452637,
386
  "learning_rate": 1.0074074074074074e-05,
387
+ "loss": 1.2358,
388
  "step": 540
389
  },
390
  {
391
  "epoch": 4.074074074074074,
392
+ "grad_norm": 2.648723840713501,
393
  "learning_rate": 9.333333333333334e-06,
394
+ "loss": 1.3655,
395
  "step": 550
396
  },
397
  {
398
  "epoch": 4.148148148148148,
399
+ "grad_norm": 2.6446926593780518,
400
  "learning_rate": 8.592592592592593e-06,
401
+ "loss": 1.397,
402
  "step": 560
403
  },
404
  {
405
  "epoch": 4.222222222222222,
406
+ "grad_norm": 2.8394277095794678,
407
  "learning_rate": 7.851851851851853e-06,
408
+ "loss": 1.2936,
409
  "step": 570
410
  },
411
  {
412
  "epoch": 4.296296296296296,
413
+ "grad_norm": 2.7919442653656006,
414
  "learning_rate": 7.111111111111112e-06,
415
+ "loss": 1.207,
416
  "step": 580
417
  },
418
  {
419
  "epoch": 4.37037037037037,
420
+ "grad_norm": 2.8117082118988037,
421
  "learning_rate": 6.370370370370371e-06,
422
+ "loss": 1.1337,
423
  "step": 590
424
  },
425
  {
426
  "epoch": 4.444444444444445,
427
+ "grad_norm": 3.2036752700805664,
428
  "learning_rate": 5.62962962962963e-06,
429
+ "loss": 1.4294,
430
  "step": 600
431
  },
432
  {
433
  "epoch": 4.518518518518518,
434
+ "grad_norm": 2.448761224746704,
435
  "learning_rate": 4.888888888888889e-06,
436
+ "loss": 1.2775,
437
  "step": 610
438
  },
439
  {
440
  "epoch": 4.592592592592593,
441
+ "grad_norm": 2.883207082748413,
442
  "learning_rate": 4.1481481481481485e-06,
443
+ "loss": 1.2913,
444
  "step": 620
445
  },
446
  {
447
  "epoch": 4.666666666666667,
448
+ "grad_norm": 3.2061944007873535,
449
  "learning_rate": 3.4074074074074077e-06,
450
+ "loss": 1.3502,
451
  "step": 630
452
  },
453
  {
454
  "epoch": 4.7407407407407405,
455
+ "grad_norm": 2.664846181869507,
456
  "learning_rate": 2.666666666666667e-06,
457
+ "loss": 1.3651,
458
  "step": 640
459
  },
460
  {
461
  "epoch": 4.814814814814815,
462
+ "grad_norm": 2.967418909072876,
463
  "learning_rate": 1.925925925925926e-06,
464
+ "loss": 1.2735,
465
  "step": 650
466
  },
467
  {
468
  "epoch": 4.888888888888889,
469
+ "grad_norm": 2.67146372795105,
470
  "learning_rate": 1.1851851851851852e-06,
471
+ "loss": 1.2634,
472
  "step": 660
473
  },
474
  {
475
  "epoch": 4.962962962962963,
476
+ "grad_norm": 2.5436322689056396,
477
  "learning_rate": 4.444444444444445e-07,
478
+ "loss": 1.2519,
479
  "step": 670
480
  }
481
  ],
 
496
  "attributes": {}
497
  }
498
  },
499
+ "total_flos": 1.17089345470464e+16,
500
  "train_batch_size": 2,
501
  "trial_name": null,
502
  "trial_params": null
checkpoint-675/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e97160594c45d89ea3fd9c68265c308064022978d7a1e8dc093e9bed35cf1cf7
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c06b70c6d22e534aee54e60ea3091f1eeba55994a544d47464b4a805ef2ab30e
3
  size 5304
runs/Jun08_08-35-32_1af0c0439b63/events.out.tfevents.1749371737.1af0c0439b63.503.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1b74493478c978c6c61600c221a675dd0467e6e0561fffd9a706aed1723483c
3
+ size 19595