End of training

Browse files

Files changed (6) hide show

README.md +39 -36
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
pytorch_model-00001-of-00002.bin +1 -1
pytorch_model-00002-of-00002.bin +1 -1
runs/Jan27_14-25-22_ced685704e0d/events.out.tfevents.1706365758.ced685704e0d.6214.0 +2 -2

README.md CHANGED Viewed

@@ -1,11 +1,14 @@
 ---
 license: mit
-base_model: microsoft/phi-2
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: ultrachat-phi-2-sft-chatml
   results: []
 ---
@@ -17,30 +20,31 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.0`
 ```yaml
-base_model: microsoft/phi-2
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
 trust_remote_code: true
-hub_model_id: AlekseyKorshuk/ultrachat-phi-2-sft-chatml
 hub_strategy: every_save
 load_in_8bit: false
 load_in_4bit: false
 strict: false
 datasets:
-  - path: AlekseyKorshuk/ultrachat_200k
-    split: train_sft
-    type: sharegpt
-    conversation: chatml
 dataset_prepared_path:
-val_set_size: 0
 output_dir: ./output
 sequence_len: 2048
-sample_packing: false
 pad_to_sequence_len:
 lora_r:
@@ -53,12 +57,12 @@ lora_fan_in_fan_out:
 wandb_project: ui-thesis
 wandb_entity:
 wandb_watch:
-wandb_name: ultrachat-phi-2-sft-chatml
 wandb_log_model:
-gradient_accumulation_steps: 2
-micro_batch_size: 16
-num_epochs: 1
 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
@@ -66,9 +70,11 @@ max_grad_norm: 1.0
 adam_epsilon: 0.00001
 lr_scheduler: cosine
 cosine_min_lr_ratio: 0.1
-learning_rate: 4e-5
-warmup_ratio: 0.1
-weight_decay: 0.1
 train_on_inputs: false
 group_by_length: false
@@ -76,6 +82,7 @@ bf16: true
 fp16: false
 tf32: true
 gradient_checkpointing: true
 early_stopping_patience:
 resume_from_checkpoint:
@@ -85,34 +92,30 @@ xformers_attention:
 flash_attention: true
-evals_per_epoch: 0
-eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
-eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
-eval_sample_packing: false
 chat_template: chatml
-saves_per_epoch: 5
 save_total_limit: 1
 seed: 42
 debug:
 deepspeed:
 fsdp:
 fsdp_config:
 resize_token_embeddings_to_32x: true
-special_tokens:
-  eos_token: "<|im_end|>"
-  pad_token: "<|endoftext|>"
-tokens:
-  - "<|im_start|>"
 ```
 </details><br>
-# ultrachat-phi-2-sft-chatml
-This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 ## Model description
@@ -131,19 +134,19 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 4e-05
-- train_batch_size: 16
-- eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
-- gradient_accumulation_steps: 2
 - total_train_batch_size: 128
-- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 36
-- num_epochs: 1
 ### Training results

 ---
 license: mit
+base_model: AlekseyKorshuk/ultrachat-phi-2-sft-chatml
 tags:
 - axolotl
+- dpo
+- trl
+- dpo
 - generated_from_trainer
 model-index:
+- name: ultrachat-phi-2-dpo-chatml
   results: []
 ---
 axolotl version: `0.4.0`
 ```yaml
+base_model: AlekseyKorshuk/ultrachat-phi-2-sft-chatml
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
 trust_remote_code: true
+hub_model_id: AlekseyKorshuk/ultrachat-phi-2-dpo-chatml
 hub_strategy: every_save
 load_in_8bit: false
 load_in_4bit: false
 strict: false
+rl: dpo
 datasets:
+  - path: argilla/ultrafeedback-binarized-preferences
+    split: train
+    type: chatml.argilla
 dataset_prepared_path:
+#val_set_size: 0.001
 output_dir: ./output
 sequence_len: 2048
+#sample_packing: false  # currently unsupported
 pad_to_sequence_len:
 lora_r:
 wandb_project: ui-thesis
 wandb_entity:
 wandb_watch:
+wandb_name: ultrachat-phi-2-dpo-chatml
 wandb_log_model:
+gradient_accumulation_steps: 4
+micro_batch_size: 8
+num_epochs: 3
 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
 adam_epsilon: 0.00001
 lr_scheduler: cosine
 cosine_min_lr_ratio: 0.1
+learning_rate: 5.0e-7
+warmup_steps: 32
+#warmup_ratio: 0.1
+weight_decay: 0.01
+dpo_beta: 0.01
 train_on_inputs: false
 group_by_length: false
 fp16: false
 tf32: true
 gradient_checkpointing: true
 early_stopping_patience:
 resume_from_checkpoint:
 flash_attention: true
+#evals_per_epoch: 5
+#eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
+#eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
 chat_template: chatml
+#saves_per_epoch: 1
+save_steps: 500
 save_total_limit: 1
 seed: 42
 debug:
 deepspeed:
 fsdp:
 fsdp_config:
 resize_token_embeddings_to_32x: true
 ```
 </details><br>
+# ultrachat-phi-2-dpo-chatml
+This model is a fine-tuned version of [AlekseyKorshuk/ultrachat-phi-2-sft-chatml](https://huggingface.co/AlekseyKorshuk/ultrachat-phi-2-sft-chatml) on the None dataset.
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-07
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
+- gradient_accumulation_steps: 4
 - total_train_batch_size: 128
+- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 32
+- training_steps: 1492
 ### Training results

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d421773a41c4ed4bccf69e07afeb8520d746baaeee593b17c1a59469311ad6e6
 size 4995584848

 version https://git-lfs.github.com/spec/v1
+oid sha256:37396b8bf1f4a7c7c449454aad7478e77ce2bbc222e08f5c8359e8920fb9ecdb
 size 4995584848

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ed4cd386f4b1347bb0db7914bd84e36c76f9cc31343922e670d9cbe7e13b07b3
 size 563833008

 version https://git-lfs.github.com/spec/v1
+oid sha256:5cc8c2b8bc8603df77d58ca1162802cd00eee24ff7bd0e45df294506d9ca6e8b
 size 563833008

pytorch_model-00001-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bcc1db82364384bba0e0ce446fd6c9802d7d1c267ef4ee8851126a014cb41639
 size 4995685647

 version https://git-lfs.github.com/spec/v1
+oid sha256:8449c5668f971b92336be6e039e7130a4cac61207ec010dc43358eb08488ef54
 size 4995685647

pytorch_model-00002-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f5034c2d3066bb7868859453966819ed69b84bb51ca159b68722ac6ec941f3f
 size 563840390

 version https://git-lfs.github.com/spec/v1
+oid sha256:f00bbeb2093a3665f2469802bca3fec2286999cb6307e16db87b7c96269f4fa4
 size 563840390

runs/Jan27_14-25-22_ced685704e0d/events.out.tfevents.1706365758.ced685704e0d.6214.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:801750c7001deab906e201bd7b5e948c9cdb8f6b8dc6d79817807168a5977fd2
-size 637299

 version https://git-lfs.github.com/spec/v1
+oid sha256:198a29b0b8ff36d2308f88dcbc5ae37d5c4b52f04e26889e638e37b4e9ec8856
+size 949581