satyaalmasian commited on
Commit
708dc5e
·
verified ·
1 Parent(s): 2edf586

Model save

Browse files
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/Phi-3-mini-4k-instruct
3
+ library_name: peft
4
+ license: mit
5
+ tags:
6
+ - trl
7
+ - sft
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: hf_phi3_lora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/hmosousa/huggingface/runs/jy8rtirf)
18
+ # hf_phi3_lora
19
+
20
+ This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on an unknown dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 1.3171
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 2e-05
42
+ - train_batch_size: 4
43
+ - eval_batch_size: 4
44
+ - seed: 42
45
+ - distributed_type: multi-GPU
46
+ - num_devices: 4
47
+ - gradient_accumulation_steps: 32
48
+ - total_train_batch_size: 512
49
+ - total_eval_batch_size: 16
50
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
51
+ - lr_scheduler_type: cosine
52
+ - lr_scheduler_warmup_steps: 1000
53
+ - num_epochs: 10
54
+
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss |
58
+ |:-------------:|:------:|:-----:|:---------------:|
59
+ | 1.4828 | 0.1489 | 500 | 1.4306 |
60
+ | 1.4047 | 0.2978 | 1000 | 1.3980 |
61
+ | 1.3611 | 0.4468 | 1500 | 1.3835 |
62
+ | 1.3653 | 0.5957 | 2000 | 1.3709 |
63
+ | 1.3171 | 0.7446 | 2500 | 1.3665 |
64
+ | 1.3089 | 0.8935 | 3000 | 1.3626 |
65
+ | 1.312 | 1.0425 | 3500 | 1.3608 |
66
+ | 1.2771 | 1.1914 | 4000 | 1.3556 |
67
+ | 1.3031 | 1.3403 | 4500 | 1.3570 |
68
+ | 1.284 | 1.4892 | 5000 | 1.3508 |
69
+ | 1.2697 | 1.6382 | 5500 | 1.3477 |
70
+ | 1.2594 | 1.7871 | 6000 | 1.3453 |
71
+ | 1.254 | 1.9360 | 6500 | 1.3413 |
72
+ | 1.2652 | 2.0849 | 7000 | 1.3426 |
73
+ | 1.2529 | 2.2338 | 7500 | 1.3435 |
74
+ | 1.2544 | 2.3828 | 8000 | 1.3382 |
75
+ | 1.2511 | 2.5317 | 8500 | 1.3396 |
76
+ | 1.2548 | 2.6806 | 9000 | 1.3361 |
77
+ | 1.2483 | 2.8295 | 9500 | 1.3351 |
78
+ | 1.2442 | 2.9785 | 10000 | 1.3382 |
79
+ | 1.2426 | 3.1274 | 10500 | 1.3344 |
80
+ | 1.2265 | 3.2763 | 11000 | 1.3361 |
81
+ | 1.2255 | 3.4252 | 11500 | 1.3356 |
82
+ | 1.2269 | 3.5742 | 12000 | 1.3314 |
83
+ | 1.2396 | 3.7231 | 12500 | 1.3298 |
84
+ | 1.2303 | 3.8720 | 13000 | 1.3260 |
85
+ | 1.2254 | 4.0209 | 13500 | 1.3277 |
86
+ | 1.2277 | 4.1698 | 14000 | 1.3272 |
87
+ | 1.2295 | 4.3188 | 14500 | 1.3240 |
88
+ | 1.2375 | 4.4677 | 15000 | 1.3288 |
89
+ | 1.2038 | 4.6166 | 15500 | 1.3224 |
90
+ | 1.2322 | 4.7655 | 16000 | 1.3214 |
91
+ | 1.2015 | 4.9145 | 16500 | 1.3246 |
92
+ | 1.208 | 5.0634 | 17000 | 1.3216 |
93
+ | 1.2248 | 5.2123 | 17500 | 1.3193 |
94
+ | 1.2155 | 5.3612 | 18000 | 1.3249 |
95
+ | 1.2194 | 5.5102 | 18500 | 1.3183 |
96
+ | 1.2185 | 5.6591 | 19000 | 1.3196 |
97
+ | 1.2119 | 5.8080 | 19500 | 1.3142 |
98
+ | 1.2171 | 5.9569 | 20000 | 1.3240 |
99
+ | 1.21 | 6.1058 | 20500 | 1.3235 |
100
+ | 1.19 | 6.2548 | 21000 | 1.3171 |
101
+
102
+
103
+ ### Framework versions
104
+
105
+ - PEFT 0.9.0
106
+ - Transformers 4.43.0.dev0
107
+ - Pytorch 2.3.0+cu121
108
+ - Datasets 2.18.0
109
+ - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b1ef0bbcba4c89044050568c4b772b42694b028dde31e057b7c6b1c515988ac2
3
  size 50366280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8a0715c5e354d77566d0ede2eb3a13b59965c15695b0dee0903d04958df11a7
3
  size 50366280
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.254770193041568,
3
+ "total_flos": 2.2989868385333726e+20,
4
+ "train_loss": 1.263435139360882,
5
+ "train_runtime": 585194.7226,
6
+ "train_samples_per_second": 29.375,
7
+ "train_steps_per_second": 0.057
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.254770193041568,
3
+ "total_flos": 2.2989868385333726e+20,
4
+ "train_loss": 1.263435139360882,
5
+ "train_runtime": 585194.7226,
6
+ "train_samples_per_second": 29.375,
7
+ "train_steps_per_second": 0.057
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff