hardlyworking
/

4Brp

@@ -1,14 +1,14 @@
 ---
 library_name: transformers
 license: cc-by-nc-4.0
-base_model: Salesforce/xgen-small-4B-base-r
 tags:
 - axolotl
 - generated_from_trainer
 datasets:
-- Mielikki/Erebus-87k
 model-index:
-- name: 4Bcpt
   results: []
 ---
@@ -20,21 +20,27 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.11.0.dev0`
 ```yaml
-base_model: Salesforce/xgen-small-4B-base-r
 load_in_8bit: false
 load_in_4bit: false
 strict: false
 datasets:
-  - path: Mielikki/Erebus-87k
-    type: completion
-    field: body
 output_dir: ./outputs/out
 dataset_prepared_path: last_run_prepared
 shuffle_merged_datasets: true
-hub_model_id: hardlyworking/4Bcpt
 hub_strategy: "all_checkpoints"
 push_dataset_to_hub:
 hf_use_auth_token: true
@@ -57,16 +63,16 @@ pad_to_sequence_len: true
 wandb_project: New4B
 wandb_entity:
 wandb_watch:
-wandb_name: New4Bcpt
 wandb_log_model:
-evals_per_epoch:
 eval_table_size:
-eval_max_new_tokens:
 gradient_accumulation_steps: 2
 micro_batch_size: 8
-num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 1e-5
@@ -102,9 +108,11 @@ special_tokens:
 </details><br>
-# 4Bcpt
-This model is a fine-tuned version of [Salesforce/xgen-small-4B-base-r](https://huggingface.co/Salesforce/xgen-small-4B-base-r) on the Mielikki/Erebus-87k dataset.
 ## Model description
@@ -131,11 +139,29 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 16
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 18
-- training_steps: 374
 ### Training results
 ### Framework versions

 ---
 library_name: transformers
 license: cc-by-nc-4.0
+base_model: hardlyworking/4Bcpt
 tags:
 - axolotl
 - generated_from_trainer
 datasets:
+- GreenerPastures/All-Your-Base-Full
 model-index:
+- name: 4Brp
   results: []
 ---
 axolotl version: `0.11.0.dev0`
 ```yaml
+base_model: hardlyworking/4Bcpt
 load_in_8bit: false
 load_in_4bit: false
 strict: false
+chat_template: chatml
 datasets:
+  - path: GreenerPastures/All-Your-Base-Full
+    type: chat_template
+    split: train
+    field_messages: conversations
+    message_property_mappings:
+      role: from
+      content: value
+val_set_size: 0.02
 output_dir: ./outputs/out
 dataset_prepared_path: last_run_prepared
 shuffle_merged_datasets: true
+hub_model_id: hardlyworking/4Brp
 hub_strategy: "all_checkpoints"
 push_dataset_to_hub:
 hf_use_auth_token: true
 wandb_project: New4B
 wandb_entity:
 wandb_watch:
+wandb_name: New4Brp
 wandb_log_model:
+evals_per_epoch: 8
 eval_table_size:
+eval_max_new_tokens: 128
 gradient_accumulation_steps: 2
 micro_batch_size: 8
+num_epochs: 2
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 1e-5
 </details><br>
+# 4Brp
+This model is a fine-tuned version of [hardlyworking/4Bcpt](https://huggingface.co/hardlyworking/4Bcpt) on the GreenerPastures/All-Your-Base-Full dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.9183
 ## Model description
 - total_train_batch_size: 16
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 57
+- training_steps: 1148
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| No log        | 0      | 0    | 1.1370          |
+| 1.0053        | 0.1253 | 72   | 0.9893          |
+| 0.9679        | 0.2507 | 144  | 0.9576          |
+| 0.966         | 0.3760 | 216  | 0.9440          |
+| 0.9397        | 0.5013 | 288  | 0.9358          |
+| 0.9563        | 0.6266 | 360  | 0.9300          |
+| 0.9034        | 0.7520 | 432  | 0.9259          |
+| 0.9214        | 0.8773 | 504  | 0.9230          |
+| 0.9155        | 1.0017 | 576  | 0.9211          |
+| 0.9072        | 1.1271 | 648  | 0.9198          |
+| 0.893         | 1.2524 | 720  | 0.9191          |
+| 0.91          | 1.3777 | 792  | 0.9186          |
+| 0.9649        | 1.5030 | 864  | 0.9184          |
+| 0.8838        | 1.6284 | 936  | 0.9183          |
+| 0.8856        | 1.7537 | 1008 | 0.9183          |
+| 0.9235        | 1.8790 | 1080 | 0.9183          |
 ### Framework versions