hardlyworking commited on
Commit
a3ed279
·
verified ·
1 Parent(s): d370920

End of training

Browse files
Files changed (1) hide show
  1. README.md +42 -16
README.md CHANGED
@@ -1,14 +1,14 @@
1
  ---
2
  library_name: transformers
3
  license: cc-by-nc-4.0
4
- base_model: Salesforce/xgen-small-4B-base-r
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  datasets:
9
- - Mielikki/Erebus-87k
10
  model-index:
11
- - name: 4Bcpt
12
  results: []
13
  ---
14
 
@@ -20,21 +20,27 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  axolotl version: `0.11.0.dev0`
22
  ```yaml
23
- base_model: Salesforce/xgen-small-4B-base-r
24
 
25
  load_in_8bit: false
26
  load_in_4bit: false
27
  strict: false
28
 
 
29
  datasets:
30
- - path: Mielikki/Erebus-87k
31
- type: completion
32
- field: body
 
 
 
 
 
33
  output_dir: ./outputs/out
34
  dataset_prepared_path: last_run_prepared
35
  shuffle_merged_datasets: true
36
 
37
- hub_model_id: hardlyworking/4Bcpt
38
  hub_strategy: "all_checkpoints"
39
  push_dataset_to_hub:
40
  hf_use_auth_token: true
@@ -57,16 +63,16 @@ pad_to_sequence_len: true
57
  wandb_project: New4B
58
  wandb_entity:
59
  wandb_watch:
60
- wandb_name: New4Bcpt
61
  wandb_log_model:
62
 
63
- evals_per_epoch:
64
  eval_table_size:
65
- eval_max_new_tokens:
66
 
67
  gradient_accumulation_steps: 2
68
  micro_batch_size: 8
69
- num_epochs: 1
70
  optimizer: adamw_bnb_8bit
71
  lr_scheduler: cosine
72
  learning_rate: 1e-5
@@ -102,9 +108,11 @@ special_tokens:
102
 
103
  </details><br>
104
 
105
- # 4Bcpt
106
 
107
- This model is a fine-tuned version of [Salesforce/xgen-small-4B-base-r](https://huggingface.co/Salesforce/xgen-small-4B-base-r) on the Mielikki/Erebus-87k dataset.
 
 
108
 
109
  ## Model description
110
 
@@ -131,11 +139,29 @@ The following hyperparameters were used during training:
131
  - total_train_batch_size: 16
132
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
- - lr_scheduler_warmup_steps: 18
135
- - training_steps: 374
136
 
137
  ### Training results
138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
 
141
  ### Framework versions
 
1
  ---
2
  library_name: transformers
3
  license: cc-by-nc-4.0
4
+ base_model: hardlyworking/4Bcpt
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  datasets:
9
+ - GreenerPastures/All-Your-Base-Full
10
  model-index:
11
+ - name: 4Brp
12
  results: []
13
  ---
14
 
 
20
 
21
  axolotl version: `0.11.0.dev0`
22
  ```yaml
23
+ base_model: hardlyworking/4Bcpt
24
 
25
  load_in_8bit: false
26
  load_in_4bit: false
27
  strict: false
28
 
29
+ chat_template: chatml
30
  datasets:
31
+ - path: GreenerPastures/All-Your-Base-Full
32
+ type: chat_template
33
+ split: train
34
+ field_messages: conversations
35
+ message_property_mappings:
36
+ role: from
37
+ content: value
38
+ val_set_size: 0.02
39
  output_dir: ./outputs/out
40
  dataset_prepared_path: last_run_prepared
41
  shuffle_merged_datasets: true
42
 
43
+ hub_model_id: hardlyworking/4Brp
44
  hub_strategy: "all_checkpoints"
45
  push_dataset_to_hub:
46
  hf_use_auth_token: true
 
63
  wandb_project: New4B
64
  wandb_entity:
65
  wandb_watch:
66
+ wandb_name: New4Brp
67
  wandb_log_model:
68
 
69
+ evals_per_epoch: 8
70
  eval_table_size:
71
+ eval_max_new_tokens: 128
72
 
73
  gradient_accumulation_steps: 2
74
  micro_batch_size: 8
75
+ num_epochs: 2
76
  optimizer: adamw_bnb_8bit
77
  lr_scheduler: cosine
78
  learning_rate: 1e-5
 
108
 
109
  </details><br>
110
 
111
+ # 4Brp
112
 
113
+ This model is a fine-tuned version of [hardlyworking/4Bcpt](https://huggingface.co/hardlyworking/4Bcpt) on the GreenerPastures/All-Your-Base-Full dataset.
114
+ It achieves the following results on the evaluation set:
115
+ - Loss: 0.9183
116
 
117
  ## Model description
118
 
 
139
  - total_train_batch_size: 16
140
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
  - lr_scheduler_type: cosine
142
+ - lr_scheduler_warmup_steps: 57
143
+ - training_steps: 1148
144
 
145
  ### Training results
146
 
147
+ | Training Loss | Epoch | Step | Validation Loss |
148
+ |:-------------:|:------:|:----:|:---------------:|
149
+ | No log | 0 | 0 | 1.1370 |
150
+ | 1.0053 | 0.1253 | 72 | 0.9893 |
151
+ | 0.9679 | 0.2507 | 144 | 0.9576 |
152
+ | 0.966 | 0.3760 | 216 | 0.9440 |
153
+ | 0.9397 | 0.5013 | 288 | 0.9358 |
154
+ | 0.9563 | 0.6266 | 360 | 0.9300 |
155
+ | 0.9034 | 0.7520 | 432 | 0.9259 |
156
+ | 0.9214 | 0.8773 | 504 | 0.9230 |
157
+ | 0.9155 | 1.0017 | 576 | 0.9211 |
158
+ | 0.9072 | 1.1271 | 648 | 0.9198 |
159
+ | 0.893 | 1.2524 | 720 | 0.9191 |
160
+ | 0.91 | 1.3777 | 792 | 0.9186 |
161
+ | 0.9649 | 1.5030 | 864 | 0.9184 |
162
+ | 0.8838 | 1.6284 | 936 | 0.9183 |
163
+ | 0.8856 | 1.7537 | 1008 | 0.9183 |
164
+ | 0.9235 | 1.8790 | 1080 | 0.9183 |
165
 
166
 
167
  ### Framework versions