li-muyang commited on
Commit
d75a9a5
·
verified ·
1 Parent(s): 84702cc

Model save

Browse files
README.md CHANGED
@@ -3,15 +3,11 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: mistralai/Mistral-7B-v0.1
5
  tags:
6
- - alignment-handbook
7
- - trl
8
- - sft
9
- - generated_from_trainer
10
  - trl
11
  - sft
12
  - generated_from_trainer
13
  datasets:
14
- - HuggingFaceH4/ultrachat_200k
15
  model-index:
16
  - name: zephyr-7b-sft-full
17
  results: []
@@ -22,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
22
 
23
  # zephyr-7b-sft-full
24
 
25
- This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrachat_200k dataset.
26
  It achieves the following results on the evaluation set:
27
- - Loss: 0.9934
28
 
29
  ## Model description
30
 
@@ -44,43 +40,43 @@ More information needed
44
 
45
  The following hyperparameters were used during training:
46
  - learning_rate: 2e-05
47
- - train_batch_size: 16
48
  - eval_batch_size: 16
49
  - seed: 42
50
  - distributed_type: multi-GPU
51
  - num_devices: 16
52
- - total_train_batch_size: 256
53
  - total_eval_batch_size: 256
54
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: cosine
56
  - lr_scheduler_warmup_ratio: 0.1
57
- - num_epochs: 3.0
58
 
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss |
62
  |:-------------:|:------:|:----:|:---------------:|
63
- | 0.9681 | 0.1845 | 100 | 0.9788 |
64
- | 0.9962 | 0.3690 | 200 | 1.0030 |
65
- | 0.9917 | 0.5535 | 300 | 1.0008 |
66
- | 0.9652 | 0.7380 | 400 | 0.9939 |
67
- | 0.9666 | 0.9225 | 500 | 0.9816 |
68
- | 0.7366 | 1.1070 | 600 | 0.9852 |
69
- | 0.7228 | 1.2915 | 700 | 0.9835 |
70
- | 0.7319 | 1.4760 | 800 | 0.9644 |
71
- | 0.7177 | 1.6605 | 900 | 0.9529 |
72
- | 0.7095 | 1.8450 | 1000 | 0.9394 |
73
- | 0.4465 | 2.0295 | 1100 | 0.9917 |
74
- | 0.4341 | 2.2140 | 1200 | 0.9979 |
75
- | 0.432 | 2.3985 | 1300 | 0.9954 |
76
- | 0.4301 | 2.5830 | 1400 | 0.9943 |
77
- | 0.4361 | 2.7675 | 1500 | 0.9931 |
78
- | 0.4256 | 2.9520 | 1600 | 0.9934 |
79
 
80
 
81
  ### Framework versions
82
 
83
- - Transformers 4.45.2
84
- - Pytorch 2.2.2+rocm5.7
85
- - Datasets 3.2.0
86
- - Tokenizers 0.20.3
 
3
  license: apache-2.0
4
  base_model: mistralai/Mistral-7B-v0.1
5
  tags:
 
 
 
 
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: zephyr-7b-sft-full
13
  results: []
 
18
 
19
  # zephyr-7b-sft-full
20
 
21
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 1.0243
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 2e-05
43
+ - train_batch_size: 8
44
  - eval_batch_size: 16
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
  - num_devices: 16
48
+ - total_train_batch_size: 128
49
  - total_eval_batch_size: 256
50
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
51
  - lr_scheduler_type: cosine
52
  - lr_scheduler_warmup_ratio: 0.1
53
+ - num_epochs: 3
54
 
55
  ### Training results
56
 
57
  | Training Loss | Epoch | Step | Validation Loss |
58
  |:-------------:|:------:|:----:|:---------------:|
59
+ | 0.9838 | 0.1845 | 200 | 0.9937 |
60
+ | 1.0162 | 0.3690 | 400 | 1.0329 |
61
+ | 1.0095 | 0.5535 | 600 | 1.0302 |
62
+ | 0.9857 | 0.7380 | 800 | 1.0204 |
63
+ | 0.9803 | 0.9225 | 1000 | 1.0051 |
64
+ | 0.736 | 1.1070 | 1200 | 1.0061 |
65
+ | 0.7249 | 1.2915 | 1400 | 1.0004 |
66
+ | 0.7355 | 1.4760 | 1600 | 0.9855 |
67
+ | 0.7151 | 1.6605 | 1800 | 0.9713 |
68
+ | 0.7023 | 1.8450 | 2000 | 0.9557 |
69
+ | 0.3925 | 2.0295 | 2200 | 1.0150 |
70
+ | 0.3871 | 2.2140 | 2400 | 1.0319 |
71
+ | 0.3927 | 2.3985 | 2600 | 1.0269 |
72
+ | 0.3872 | 2.5830 | 2800 | 1.0267 |
73
+ | 0.3918 | 2.7675 | 3000 | 1.0242 |
74
+ | 0.3764 | 2.9520 | 3200 | 1.0243 |
75
 
76
 
77
  ### Framework versions
78
 
79
+ - Transformers 4.51.3
80
+ - Pytorch 2.5.1+rocm6.2
81
+ - Datasets 3.5.0
82
+ - Tokenizers 0.21.1
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
  "epoch": 3.0,
3
- "eval_loss": 0.9934042692184448,
4
- "eval_runtime": 518.8262,
5
- "eval_samples": 23109,
6
- "eval_samples_per_second": 29.586,
7
- "eval_steps_per_second": 0.116,
8
  "total_flos": 1361805280542720.0,
9
- "train_loss": 0.713569560815634,
10
- "train_runtime": 59769.2599,
11
  "train_samples": 207864,
12
- "train_samples_per_second": 6.961,
13
- "train_steps_per_second": 0.027
14
  }
 
1
  {
2
  "epoch": 3.0,
 
 
 
 
 
3
  "total_flos": 1361805280542720.0,
4
+ "train_loss": 0.7043063408920832,
5
+ "train_runtime": 81819.3933,
6
  "train_samples": 207864,
7
+ "train_samples_per_second": 5.085,
8
+ "train_steps_per_second": 0.04
9
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.45.2"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.51.3"
6
  }
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 3.0,
3
  "total_flos": 1361805280542720.0,
4
- "train_loss": 0.713569560815634,
5
- "train_runtime": 59769.2599,
6
  "train_samples": 207864,
7
- "train_samples_per_second": 6.961,
8
- "train_steps_per_second": 0.027
9
  }
 
1
  {
2
  "epoch": 3.0,
3
  "total_flos": 1361805280542720.0,
4
+ "train_loss": 0.7043063408920832,
5
+ "train_runtime": 81819.3933,
6
  "train_samples": 207864,
7
+ "train_samples_per_second": 5.085,
8
+ "train_steps_per_second": 0.04
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff