error577
/

634fe6e7-ba15-40a0-84cd-c93ce43b7688

@@ -47,7 +47,7 @@ flash_attention: true
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 32
 gradient_checkpointing: true
 group_by_length: false
 hub_model_id: error577/634fe6e7-ba15-40a0-84cd-c93ce43b7688
@@ -66,12 +66,11 @@ lora_model_dir: null
 lora_r: 32
 lora_target_linear: true
 lr_scheduler: cosine
-max_steps: 100
 micro_batch_size: 1
-max_grad_norm: 2
 mlflow_experiment_name: /tmp/4404b6e6064c8d37_train_data.json
 model_type: AutoModelForCausalLM
-num_epochs: 1
 optimizer: paged_adamw_32bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
@@ -92,7 +91,7 @@ wandb_name: 279e5cfb-d198-4bf0-8895-5af873459233
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 279e5cfb-d198-4bf0-8895-5af873459233
-warmup_steps: 10
 weight_decay: 0.0
 xformers_attention: null
@@ -104,7 +103,7 @@ xformers_attention: null
 This model is a fine-tuned version of [Orenguteng/Llama-3-8B-Lexi-Uncensored](https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.3612
 ## Model description
@@ -127,19 +126,28 @@ The following hyperparameters were used during training:
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
-- gradient_accumulation_steps: 32
-- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 10
-- training_steps: 50
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 2.1426        | 0.9880 | 49   | 2.3614          |
-| 3.2277        | 1.0082 | 50   | 2.3612          |
 ### Framework versions

 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 8
 gradient_checkpointing: true
 group_by_length: false
 hub_model_id: error577/634fe6e7-ba15-40a0-84cd-c93ce43b7688
 lora_r: 32
 lora_target_linear: true
 lr_scheduler: cosine
+max_steps: 1000
 micro_batch_size: 1
 mlflow_experiment_name: /tmp/4404b6e6064c8d37_train_data.json
 model_type: AutoModelForCausalLM
+num_epochs: 10
 optimizer: paged_adamw_32bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 279e5cfb-d198-4bf0-8895-5af873459233
+warmup_steps: 20
 weight_decay: 0.0
 xformers_attention: null
 This model is a fine-tuned version of [Orenguteng/Llama-3-8B-Lexi-Uncensored](https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2154
 ## Model description
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 20
+- training_steps: 1000
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 4.3457        | 0.0050 | 1    | 4.1774          |
+| 2.5186        | 0.5041 | 100  | 2.3878          |
+| 2.131         | 1.0082 | 200  | 2.3347          |
+| 1.6391        | 1.5123 | 300  | 2.4039          |
+| 1.1056        | 2.0164 | 400  | 2.4127          |
+| 1.2669        | 2.5205 | 500  | 2.6209          |
+| 0.5887        | 3.0246 | 600  | 2.6697          |
+| 0.5802        | 3.5287 | 700  | 2.9570          |
+| 0.2365        | 4.0328 | 800  | 3.0128          |
+| 0.3664        | 4.5369 | 900  | 3.2092          |
+| 0.1177        | 5.0410 | 1000 | 3.2154          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f574e6bd1109ef08b0a9b0392a8ae0f154ee1f28fb011332ef0f952a78178286
 size 335706186

 version https://git-lfs.github.com/spec/v1
+oid sha256:6e65bc1ac6940eecb97aa11a07c09876dca746f71188060d7831d3b1aab228f9
 size 335706186