FrenzyBiscuit commited on
Commit
61cc55f
·
verified ·
1 Parent(s): f6be070

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -299
README.md CHANGED
@@ -1,304 +1,11 @@
1
  ---
 
 
2
  library_name: transformers
3
- license: apache-2.0
4
- base_model: PocketDoc/Dans-SakuraKaze-V1.0.0-12b
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
10
- - PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
11
- - PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
12
- - PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
13
- - PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
14
- - PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
15
- - PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
16
- - PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
17
- - PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
18
- - PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
19
- model-index:
20
- - name: MN-2407-DSK-QwQify-v0.1-12B-LoRA-WS
21
- results: []
22
- ---
23
-
24
- # BeaverAI/MN-2407-DSK-QwQify-v0.1-12B
25
-
26
- [GGUF](https://huggingface.co/bartowski/BeaverAI_MN-2407-DSK-QwQify-v0.1-12B-GGUF)
27
-
28
- Test model to try to give an existing model QwQ's thoughts. For this first version it is ontop of [`PocketDoc/Dans-SakuraKaze-V1.0.0-12b`](https://huggingface.co/PocketDoc/Dans-SakuraKaze-V1.0.0-12b) (an rp/adventure/co-writing model), which was trained ontop of [`PocketDoc/Dans-PersonalityEngine-V1.1.0-12b`](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.1.0-12b) (a jack of all trades instruct model), which was trained ontop of [`mistralai/Mistral-Nemo-Base-2407`](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407).
29
-
30
- The prompt formatting and usage should be the same as with QwQ; Use ChatML, and remove the thinking from previous turns. If thoughts arent being generated automatically, add `<think>\n` to the start of the assistant turn.
31
-
32
- It should follow previous model turns formatting. On first turns of the conversation you may need to regen a few times, and maybe edit the model responses for the first few turns to get it to your liking.
33
-
34
- You may want to disable inserting `{{char}}:` prefix for the character, and instead add something like `Only speak as "{{char}}" in conversation with "{{user}}". Output your final response with a "{{char}}:" prefix.` to the end of you system prompt.
35
-
36
- ![image/png](https://i.imgur.com/loc5WQU.png)
37
-
38
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
39
- should probably proofread and complete it, then remove this comment. -->
40
-
41
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
42
- <details><summary>See axolotl config</summary>
43
-
44
- axolotl version: `0.8.0.dev0`
45
- ```yaml
46
- mlflow_tracking_uri: http://127.0.0.1:7860
47
- mlflow_experiment_name: MN-2407-DSK-QwQify-v0.1-12B-LoRA
48
-
49
- # Hugging Face saving config
50
- hub_model_id: BeaverAI/MN-2407-DSK-QwQify-v0.1-12B-LoRA-WS
51
- hub_strategy: every_save
52
-
53
- # Model checkpointing config
54
- output_dir: ./Outputs/MN-2407-DSK-QwQify-v0.1-12B-LoRA
55
- resume_from_checkpoint:
56
- save_steps: 25
57
- save_safetensors: true
58
- save_total_limit: 3
59
- save_only_model: false
60
-
61
- # Model architecture config
62
- base_model: PocketDoc/Dans-SakuraKaze-V1.0.0-12b
63
- model_type: MistralForCausalLM
64
- tokenizer_type: PreTrainedTokenizerFast
65
-
66
- # Mixed precision training config
67
- bf16: true
68
- fp16: false
69
- tf32: false
70
-
71
- # Model loading config
72
- load_in_8bit: false
73
- load_in_4bit: false
74
- strict: false
75
-
76
- # Sequence config
77
- sequence_len: 8192
78
- min_sample_len: 256
79
- sample_packing: true
80
- eval_sample_packing: true
81
- pad_to_sequence_len: true
82
- train_on_inputs: false
83
- group_by_length: false
84
-
85
- # LoRA adapter config
86
- adapter: lora
87
- lora_model_dir:
88
- lora_r: 128
89
- lora_alpha: 128
90
- lora_dropout: 0.125
91
- peft_layers_to_transform:
92
- peft_use_dora:
93
- peft_use_rslora:
94
- peft_layer_replication:
95
- lora_target_modules:
96
- - gate_proj
97
- - down_proj
98
- - up_proj
99
- - q_proj
100
- - v_proj
101
- - k_proj
102
- - o_proj
103
- lora_modules_to_save:
104
-
105
- # Fix uninitialized tokens (such as <|start_header_id|> on the base L3 models)
106
- fix_untrained_tokens:
107
-
108
- # Dataset config
109
- # https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/customchatml-regex-last-only.py
110
- datasets:
111
- - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
112
- split: train
113
- type: customchatml-regex-last-only
114
- - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
115
- split: train
116
- type: customchatml-regex-last-only
117
- - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
118
- split: train
119
- type: customchatml-regex-last-only
120
- - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
121
- split: train
122
- type: customchatml-regex-last-only
123
- - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
124
- split: train
125
- type: customchatml-regex-last-only
126
- - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
127
- split: train
128
- type: customchatml-regex-last-only
129
- - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
130
- split: train
131
- type: customchatml-regex-last-only
132
- - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
133
- split: train
134
- type: customchatml-regex-last-only
135
- - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
136
- split: train
137
- type: customchatml-regex-last-only
138
- - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
139
- split: train
140
- type: customchatml-regex-last-only
141
- test_datasets:
142
- - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
143
- split: test
144
- type: customchatml-regex-last-only
145
- - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
146
- split: test
147
- type: customchatml-regex-last-only
148
- - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
149
- split: test
150
- type: customchatml-regex-last-only
151
- - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
152
- split: test
153
- type: customchatml-regex-last-only
154
- - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
155
- split: test
156
- type: customchatml-regex-last-only
157
- - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
158
- split: test
159
- type: customchatml-regex-last-only
160
- - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
161
- split: test
162
- type: customchatml-regex-last-only
163
- - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
164
- split: test
165
- type: customchatml-regex-last-only
166
- - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
167
- split: test
168
- type: customchatml-regex-last-only
169
- - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
170
- split: test
171
- type: customchatml-regex-last-only
172
- val_set_size: 0
173
- eval_strategy: steps
174
- eval_steps: 25
175
- dataset_prepared_path: ./00-Tokenized-Datasets/MN-2407-DSK-QwQify-v0.1-12B-customchatml-regex-last-only
176
- shuffle_merged_datasets: true
177
- dataset_processes:
178
-
179
- # Training hyperparameters
180
- num_epochs: 2
181
- gradient_accumulation_steps: 1
182
- micro_batch_size: 16 # x4 GPUs = 64
183
- eval_batch_size: 16 # x4 GPUs = 64
184
- warmup_steps: 0
185
- optimizer: came_pytorch
186
- optim_args:
187
- optim_target_modules:
188
- lr_scheduler: rex
189
- learning_rate: 2e-5
190
- cosine_min_lr_ratio:
191
- loraplus_lr_ratio:
192
- loraplus_lr_embedding:
193
- weight_decay: 0.1
194
- max_grad_norm: 1
195
- logging_steps: 1
196
-
197
- # Model optimization
198
- gradient_checkpointing: unsloth
199
- flash_attention: true
200
- plugins:
201
- - axolotl.integrations.liger.LigerPlugin
202
- - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
203
- cut_cross_entropy: true
204
- liger_rope: true
205
- liger_rms_norm: true
206
- liger_layer_norm: true
207
- liger_glu_activation: true
208
- liger_cross_entropy: false
209
- liger_fused_linear_cross_entropy: false
210
- lora_mlp_kernel: false
211
- lora_qkv_kernel: false
212
- lora_o_kernel: false
213
-
214
- # DeepSpeed
215
- deepspeed: deepspeed_configs/zero3_bf16.json
216
-
217
- # Garbage Collection
218
- gc_steps: 1
219
-
220
- # Debug config
221
- debug: true
222
- seed: 42
223
-
224
- # Token config
225
- special_tokens:
226
- bos_token: "<s>"
227
- eos_token: "<|im_end|>"
228
- pad_token: "<pad>"
229
- tokens:
230
-
231
- ```
232
-
233
- </details><br>
234
-
235
- # MN-2407-DSK-QwQify-v0.1-12B-LoRA-WS
236
-
237
- This model is a fine-tuned version of [PocketDoc/Dans-SakuraKaze-V1.0.0-12b](https://huggingface.co/PocketDoc/Dans-SakuraKaze-V1.0.0-12b) on the PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled and the PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled datasets.
238
- It achieves the following results on the evaluation set:
239
- - Loss: 1.2770
240
-
241
- ## Model description
242
-
243
- More information needed
244
-
245
- ## Intended uses & limitations
246
-
247
- More information needed
248
-
249
- ## Training and evaluation data
250
-
251
- More information needed
252
-
253
- ## Training procedure
254
-
255
- ### Training hyperparameters
256
-
257
- The following hyperparameters were used during training:
258
- - learning_rate: 2e-05
259
- - train_batch_size: 16
260
- - eval_batch_size: 16
261
- - seed: 42
262
- - distributed_type: multi-GPU
263
- - num_devices: 4
264
- - total_train_batch_size: 64
265
- - total_eval_batch_size: 64
266
- - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
267
- - lr_scheduler_type: cosine
268
- - num_epochs: 2.0
269
-
270
- ### Training results
271
-
272
- | Training Loss | Epoch | Step | Validation Loss |
273
- |:-------------:|:------:|:----:|:---------------:|
274
- | 2.134 | 0.0038 | 1 | 2.0025 |
275
- | 1.6185 | 0.0951 | 25 | 1.5748 |
276
- | 1.5187 | 0.1901 | 50 | 1.4871 |
277
- | 1.4757 | 0.2852 | 75 | 1.4410 |
278
- | 1.4008 | 0.3802 | 100 | 1.4100 |
279
- | 1.4116 | 0.4753 | 125 | 1.3857 |
280
- | 1.357 | 0.5703 | 150 | 1.3630 |
281
- | 1.3435 | 0.6654 | 175 | 1.3478 |
282
- | 1.3332 | 0.7605 | 200 | 1.3353 |
283
- | 1.3042 | 0.8555 | 225 | 1.3308 |
284
- | 1.2993 | 0.9506 | 250 | 1.3228 |
285
- | 1.3105 | 1.0456 | 275 | 1.3154 |
286
- | 1.2782 | 1.1407 | 300 | 1.3094 |
287
- | 1.3063 | 1.2357 | 325 | 1.3070 |
288
- | 1.3003 | 1.3308 | 350 | 1.3005 |
289
- | 1.2937 | 1.4259 | 375 | 1.2952 |
290
- | 1.283 | 1.5209 | 400 | 1.2922 |
291
- | 1.2692 | 1.6160 | 425 | 1.2887 |
292
- | 1.2639 | 1.7110 | 450 | 1.2855 |
293
- | 1.2546 | 1.8061 | 475 | 1.2822 |
294
- | 1.2711 | 1.9011 | 500 | 1.2787 |
295
- | 1.2492 | 1.9962 | 525 | 1.2770 |
296
 
 
297
 
298
- ### Framework versions
299
 
300
- - PEFT 0.14.0
301
- - Transformers 4.49.0
302
- - Pytorch 2.6.0+cu124
303
- - Datasets 3.2.0
304
- - Tokenizers 0.21.1
 
1
  ---
2
+ base_model:
3
+ - BeaverAI/MN-2407-DSK-QwQify-v0.1-12B
4
  library_name: transformers
5
+ base_model_relation: quantized
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ ---
8
 
9
+ EXL2 Quants by FrenzyBiscuit.
10
 
11
+ This model is 5.0 BPW EXL2.