error577 commited on
Commit
3cdd74b
·
verified ·
1 Parent(s): af9e516

End of training

Browse files
Files changed (2) hide show
  1. README.md +20 -12
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -47,7 +47,7 @@ flash_attention: true
47
  fp16: null
48
  fsdp: null
49
  fsdp_config: null
50
- gradient_accumulation_steps: 32
51
  gradient_checkpointing: true
52
  group_by_length: false
53
  hub_model_id: error577/634fe6e7-ba15-40a0-84cd-c93ce43b7688
@@ -66,12 +66,11 @@ lora_model_dir: null
66
  lora_r: 32
67
  lora_target_linear: true
68
  lr_scheduler: cosine
69
- max_steps: 100
70
  micro_batch_size: 1
71
- max_grad_norm: 2
72
  mlflow_experiment_name: /tmp/4404b6e6064c8d37_train_data.json
73
  model_type: AutoModelForCausalLM
74
- num_epochs: 1
75
  optimizer: paged_adamw_32bit
76
  output_dir: miner_id_24
77
  pad_to_sequence_len: true
@@ -92,7 +91,7 @@ wandb_name: 279e5cfb-d198-4bf0-8895-5af873459233
92
  wandb_project: Gradients-On-Demand
93
  wandb_run: your_name
94
  wandb_runid: 279e5cfb-d198-4bf0-8895-5af873459233
95
- warmup_steps: 10
96
  weight_decay: 0.0
97
  xformers_attention: null
98
 
@@ -104,7 +103,7 @@ xformers_attention: null
104
 
105
  This model is a fine-tuned version of [Orenguteng/Llama-3-8B-Lexi-Uncensored](https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored) on the None dataset.
106
  It achieves the following results on the evaluation set:
107
- - Loss: 2.3612
108
 
109
  ## Model description
110
 
@@ -127,19 +126,28 @@ The following hyperparameters were used during training:
127
  - train_batch_size: 1
128
  - eval_batch_size: 1
129
  - seed: 42
130
- - gradient_accumulation_steps: 32
131
- - total_train_batch_size: 32
132
  - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
- - lr_scheduler_warmup_steps: 10
135
- - training_steps: 50
136
 
137
  ### Training results
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
- | 2.1426 | 0.9880 | 49 | 2.3614 |
142
- | 3.2277 | 1.0082 | 50 | 2.3612 |
 
 
 
 
 
 
 
 
 
143
 
144
 
145
  ### Framework versions
 
47
  fp16: null
48
  fsdp: null
49
  fsdp_config: null
50
+ gradient_accumulation_steps: 8
51
  gradient_checkpointing: true
52
  group_by_length: false
53
  hub_model_id: error577/634fe6e7-ba15-40a0-84cd-c93ce43b7688
 
66
  lora_r: 32
67
  lora_target_linear: true
68
  lr_scheduler: cosine
69
+ max_steps: 1000
70
  micro_batch_size: 1
 
71
  mlflow_experiment_name: /tmp/4404b6e6064c8d37_train_data.json
72
  model_type: AutoModelForCausalLM
73
+ num_epochs: 10
74
  optimizer: paged_adamw_32bit
75
  output_dir: miner_id_24
76
  pad_to_sequence_len: true
 
91
  wandb_project: Gradients-On-Demand
92
  wandb_run: your_name
93
  wandb_runid: 279e5cfb-d198-4bf0-8895-5af873459233
94
+ warmup_steps: 20
95
  weight_decay: 0.0
96
  xformers_attention: null
97
 
 
103
 
104
  This model is a fine-tuned version of [Orenguteng/Llama-3-8B-Lexi-Uncensored](https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored) on the None dataset.
105
  It achieves the following results on the evaluation set:
106
+ - Loss: 3.2154
107
 
108
  ## Model description
109
 
 
126
  - train_batch_size: 1
127
  - eval_batch_size: 1
128
  - seed: 42
129
+ - gradient_accumulation_steps: 8
130
+ - total_train_batch_size: 8
131
  - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
132
  - lr_scheduler_type: cosine
133
+ - lr_scheduler_warmup_steps: 20
134
+ - training_steps: 1000
135
 
136
  ### Training results
137
 
138
  | Training Loss | Epoch | Step | Validation Loss |
139
  |:-------------:|:------:|:----:|:---------------:|
140
+ | 4.3457 | 0.0050 | 1 | 4.1774 |
141
+ | 2.5186 | 0.5041 | 100 | 2.3878 |
142
+ | 2.131 | 1.0082 | 200 | 2.3347 |
143
+ | 1.6391 | 1.5123 | 300 | 2.4039 |
144
+ | 1.1056 | 2.0164 | 400 | 2.4127 |
145
+ | 1.2669 | 2.5205 | 500 | 2.6209 |
146
+ | 0.5887 | 3.0246 | 600 | 2.6697 |
147
+ | 0.5802 | 3.5287 | 700 | 2.9570 |
148
+ | 0.2365 | 4.0328 | 800 | 3.0128 |
149
+ | 0.3664 | 4.5369 | 900 | 3.2092 |
150
+ | 0.1177 | 5.0410 | 1000 | 3.2154 |
151
 
152
 
153
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f574e6bd1109ef08b0a9b0392a8ae0f154ee1f28fb011332ef0f952a78178286
3
  size 335706186
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e65bc1ac6940eecb97aa11a07c09876dca746f71188060d7831d3b1aab228f9
3
  size 335706186