Heralax commited on
Commit
ed180bb
·
verified ·
1 Parent(s): 52d4198

End of training

Browse files
Files changed (2) hide show
  1. README.md +259 -0
  2. generation_config.json +7 -0
README.md ADDED
@@ -0,0 +1,259 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Heralax/datagen-pretrain-v1-7b-mistralv0.2
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - 29_mil_asstr.jsonl
10
+ - 40mil_gutenberg.jsonl
11
+ - hle-1_formatted_2mil.jsonl
12
+ - 11_mil_fineweb.jsonl
13
+ - multiturn_segments_shard_01.json
14
+ - multiturn_segments_shard_02.json
15
+ - singleturn_segments_shard_01.json
16
+ - singleturn_segments_shard_02.json
17
+ - openhermes2_5_shard_01.json
18
+ - openhermes2_5_shard_02.json
19
+ - openthoughts-1.parquet
20
+ - openthoughts-2.parquet
21
+ - qwq_10million.jsonl
22
+ - bluemoon-6mil.json
23
+ model-index:
24
+ - name: datagen-sft-1
25
+ results: []
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.10.0.dev0`
35
+ ```yaml
36
+ base_model: Heralax/datagen-pretrain-v1-7b-mistralv0.2
37
+ tokenizer_type: AutoTokenizer
38
+ model_type: AutoModelForCausalLM
39
+ is_mistral_derived_model: true
40
+ load_in_8bit: false
41
+ load_in_4bit: false
42
+ strict: false
43
+
44
+ datasets:
45
+ - path: 29_mil_asstr.jsonl
46
+ ds_type: json
47
+ type: completion
48
+ - path: 40mil_gutenberg.jsonl
49
+ type: completion
50
+ - path: hle-1_formatted_2mil.jsonl
51
+ type: completion
52
+ - path: 11_mil_fineweb.jsonl
53
+ type: completion
54
+ - path: multiturn_segments_shard_01.json
55
+ type: input_output
56
+ - path: multiturn_segments_shard_02.json
57
+ type: input_output
58
+ - path: singleturn_segments_shard_01.json
59
+ type: input_output
60
+ - path: singleturn_segments_shard_02.json
61
+ type: input_output
62
+ - path: openhermes2_5_shard_01.json
63
+ type: chat_template
64
+ chat_template: chatml
65
+ field_messages: conversations
66
+ message_field_role: from
67
+ message_field_content: value
68
+ roles:
69
+ user:
70
+ - human
71
+ assistant:
72
+ - gpt
73
+ system:
74
+ - system
75
+ - path: openhermes2_5_shard_02.json
76
+ type: chat_template
77
+ chat_template: chatml
78
+ field_messages: conversations
79
+ message_field_role: from
80
+ message_field_content: value
81
+ roles:
82
+ user:
83
+ - human
84
+ assistant:
85
+ - gpt
86
+ system:
87
+ - system
88
+ - path: openthoughts-1.parquet
89
+ type: chat_template
90
+ chat_template: chatml
91
+ field_messages: conversations
92
+ message_field_role: from
93
+ message_field_content: value
94
+ roles:
95
+ user:
96
+ - user
97
+ assistant:
98
+ - assistant
99
+ system:
100
+ - system
101
+ - path: openthoughts-2.parquet
102
+ type: chat_template
103
+ chat_template: chatml
104
+ field_messages: conversations
105
+ message_field_role: from
106
+ message_field_content: value
107
+ roles:
108
+ user:
109
+ - user
110
+ assistant:
111
+ - assistant
112
+ system:
113
+ - system
114
+ - path: qwq_10million.jsonl
115
+ type: chat_template
116
+ chat_template: chatml
117
+ field_messages: conversations
118
+ message_field_role: from
119
+ message_field_content: value
120
+ roles:
121
+ user:
122
+ - human
123
+ assistant:
124
+ - gpt
125
+ system:
126
+ - system
127
+ - path: bluemoon-6mil.json
128
+ type: chat_template
129
+ chat_template: chatml
130
+ field_messages: conversations
131
+ message_field_role: from
132
+ message_field_content: value
133
+ roles:
134
+ user:
135
+ - human
136
+ assistant:
137
+ - gpt
138
+ system:
139
+ - system
140
+ dataset_prepared_path: last_run_prepared
141
+ output_dir: ./datagen-pretrain-v1-7b-mistralv0.2
142
+ seed: 11037
143
+ hub_model_id: datagen-sft-1
144
+ hub_strategy: every_save
145
+
146
+ sequence_len: 20000
147
+ sample_packing: true
148
+ pad_to_sequence_len: false
149
+ shuffle_merged_datasets: true
150
+
151
+ wandb_project: datagen-pretrain-v1-7b-mistralv0.2
152
+ wandb_entity:
153
+ wandb_watch:
154
+ wandb_run_id:
155
+ wandb_log_model:
156
+
157
+
158
+ gradient_accumulation_steps: 50
159
+ micro_batch_size: 3
160
+ eval_batch_size: 1
161
+ num_epochs: 2
162
+ optimizer: paged_adamw_8bit
163
+ lr_scheduler: constant
164
+ learning_rate: 0.000020
165
+ weight_decay: 0
166
+ train_on_inputs: true
167
+ group_by_length: false
168
+ bf16: true
169
+ fp16: false
170
+ tf32: false
171
+
172
+ gradient_checkpointing: true
173
+ early_stopping_patience:
174
+ resume_from_checkpoint:
175
+ logging_steps: 1
176
+ xformers_attention: false # faster
177
+ flash_attention: true # slower than xformers
178
+
179
+ chat_template: chatml
180
+
181
+ # warmup_ratio: 0.5
182
+ # warmup_steps: 0
183
+ auto_resume_from_checkpoints: false
184
+ warmup_ratio: 0.1
185
+ evals_per_epoch: 1
186
+ eval_batch_size: 4
187
+ val_set_size: 0.01
188
+ save_steps: 1000
189
+ eval_sample_packing: false
190
+ save_total_limit: 2 # NOTE you can afford many more saves with this config due to not storing optimizer states like with normal ones I think.
191
+ debug:
192
+ special_tokens:
193
+ pad_token: "<unk>"
194
+
195
+ use_liger_kernel: true
196
+
197
+
198
+ plugins:
199
+ - axolotl.integrations.liger.LigerPlugin
200
+ liger_rope: true
201
+ liger_rms_norm: true
202
+ liger_glu_activation: true
203
+ liger_layer_norm: true
204
+ liger_fused_linear_cross_entropy: true
205
+
206
+
207
+ ```
208
+
209
+ </details><br>
210
+
211
+ # datagen-sft-1
212
+
213
+ This model is a fine-tuned version of [Heralax/datagen-pretrain-v1-7b-mistralv0.2](https://huggingface.co/Heralax/datagen-pretrain-v1-7b-mistralv0.2) on the 29_mil_asstr.jsonl, the 40mil_gutenberg.jsonl, the hle-1_formatted_2mil.jsonl, the 11_mil_fineweb.jsonl, the multiturn_segments_shard_01.json, the multiturn_segments_shard_02.json, the singleturn_segments_shard_01.json, the singleturn_segments_shard_02.json, the openhermes2_5_shard_01.json, the openhermes2_5_shard_02.json, the openthoughts-1.parquet, the openthoughts-2.parquet, the qwq_10million.jsonl and the bluemoon-6mil.json datasets.
214
+ It achieves the following results on the evaluation set:
215
+ - Loss: 0.6304
216
+
217
+ ## Model description
218
+
219
+ More information needed
220
+
221
+ ## Intended uses & limitations
222
+
223
+ More information needed
224
+
225
+ ## Training and evaluation data
226
+
227
+ More information needed
228
+
229
+ ## Training procedure
230
+
231
+ ### Training hyperparameters
232
+
233
+ The following hyperparameters were used during training:
234
+ - learning_rate: 2e-05
235
+ - train_batch_size: 3
236
+ - eval_batch_size: 4
237
+ - seed: 11037
238
+ - gradient_accumulation_steps: 50
239
+ - total_train_batch_size: 150
240
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
241
+ - lr_scheduler_type: constant
242
+ - lr_scheduler_warmup_steps: 111
243
+ - num_epochs: 2.0
244
+
245
+ ### Training results
246
+
247
+ | Training Loss | Epoch | Step | Validation Loss |
248
+ |:-------------:|:------:|:----:|:---------------:|
249
+ | 1.4533 | 0.0018 | 1 | 2.4612 |
250
+ | 0.5531 | 0.9999 | 558 | 0.6706 |
251
+ | 0.5148 | 1.9981 | 1116 | 0.6304 |
252
+
253
+
254
+ ### Framework versions
255
+
256
+ - Transformers 4.51.3
257
+ - Pytorch 2.6.0+cu124
258
+ - Datasets 3.5.0
259
+ - Tokenizers 0.21.1
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "transformers_version": "4.51.3"
7
+ }