marcus2000

Saiga_timelist_task200steps

7a567d3 verified 7 months ago

6.46 kB

	---
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: TheBloke/Llama-2-7B-fp16
	model-index:
	- name: Saiga_timelist_task200steps
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Saiga_timelist_task200steps

	This model is a fine-tuned version of [TheBloke/Llama-2-7B-fp16](https://huggingface.co/TheBloke/Llama-2-7B-fp16) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4521

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 10
	- total_train_batch_size: 20
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 200

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.2298 \| 0.37 \| 2 \| 2.2020 \|
	\| 2.0975 \| 0.74 \| 4 \| 2.1478 \|
	\| 2.0243 \| 1.11 \| 6 \| 2.1123 \|
	\| 1.988 \| 1.48 \| 8 \| 2.0857 \|
	\| 1.9585 \| 1.85 \| 10 \| 2.0692 \|
	\| 1.883 \| 2.22 \| 12 \| 2.0570 \|
	\| 1.9078 \| 2.59 \| 14 \| 2.0477 \|
	\| 1.9179 \| 2.96 \| 16 \| 2.0408 \|
	\| 1.8663 \| 3.33 \| 18 \| 2.0366 \|
	\| 1.8191 \| 3.7 \| 20 \| 2.0325 \|
	\| 1.8515 \| 4.07 \| 22 \| 2.0280 \|
	\| 1.8189 \| 4.44 \| 24 \| 2.0246 \|
	\| 1.8478 \| 4.81 \| 26 \| 2.0215 \|
	\| 1.7767 \| 5.19 \| 28 \| 2.0198 \|
	\| 1.7685 \| 5.56 \| 30 \| 2.0190 \|
	\| 1.7895 \| 5.93 \| 32 \| 2.0189 \|
	\| 1.7285 \| 6.3 \| 34 \| 2.0191 \|
	\| 1.7609 \| 6.67 \| 36 \| 2.0174 \|
	\| 1.7138 \| 7.04 \| 38 \| 2.0156 \|
	\| 1.7112 \| 7.41 \| 40 \| 2.0187 \|
	\| 1.7029 \| 7.78 \| 42 \| 2.0216 \|
	\| 1.6787 \| 8.15 \| 44 \| 2.0203 \|
	\| 1.646 \| 8.52 \| 46 \| 2.0243 \|
	\| 1.5996 \| 8.89 \| 48 \| 2.0294 \|
	\| 1.6838 \| 9.26 \| 50 \| 2.0280 \|
	\| 1.6057 \| 9.63 \| 52 \| 2.0254 \|
	\| 1.574 \| 10.0 \| 54 \| 2.0310 \|
	\| 1.51 \| 10.37 \| 56 \| 2.0547 \|
	\| 1.5951 \| 10.74 \| 58 \| 2.0420 \|
	\| 1.5455 \| 11.11 \| 60 \| 2.0350 \|
	\| 1.5424 \| 11.48 \| 62 \| 2.0612 \|
	\| 1.4933 \| 11.85 \| 64 \| 2.0652 \|
	\| 1.5766 \| 12.22 \| 66 \| 2.0537 \|
	\| 1.4453 \| 12.59 \| 68 \| 2.0732 \|
	\| 1.4683 \| 12.96 \| 70 \| 2.0763 \|
	\| 1.4734 \| 13.33 \| 72 \| 2.0805 \|
	\| 1.4314 \| 13.7 \| 74 \| 2.0908 \|
	\| 1.3921 \| 14.07 \| 76 \| 2.0815 \|
	\| 1.4099 \| 14.44 \| 78 \| 2.1134 \|
	\| 1.4389 \| 14.81 \| 80 \| 2.0955 \|
	\| 1.3114 \| 15.19 \| 82 \| 2.1153 \|
	\| 1.3093 \| 15.56 \| 84 \| 2.1303 \|
	\| 1.3984 \| 15.93 \| 86 \| 2.1246 \|
	\| 1.2831 \| 16.3 \| 88 \| 2.1564 \|
	\| 1.2971 \| 16.67 \| 90 \| 2.1284 \|
	\| 1.3052 \| 17.04 \| 92 \| 2.1608 \|
	\| 1.2421 \| 17.41 \| 94 \| 2.1556 \|
	\| 1.1835 \| 17.78 \| 96 \| 2.1734 \|
	\| 1.283 \| 18.15 \| 98 \| 2.1773 \|
	\| 1.2311 \| 18.52 \| 100 \| 2.1992 \|
	\| 1.2428 \| 18.89 \| 102 \| 2.1954 \|
	\| 1.1959 \| 19.26 \| 104 \| 2.2065 \|
	\| 1.2376 \| 19.63 \| 106 \| 2.2124 \|
	\| 1.0689 \| 20.0 \| 108 \| 2.2266 \|
	\| 1.1471 \| 20.37 \| 110 \| 2.2266 \|
	\| 1.0068 \| 20.74 \| 112 \| 2.2451 \|
	\| 1.161 \| 21.11 \| 114 \| 2.2501 \|
	\| 1.1252 \| 21.48 \| 116 \| 2.2579 \|
	\| 1.0683 \| 21.85 \| 118 \| 2.2595 \|
	\| 1.1279 \| 22.22 \| 120 \| 2.2904 \|
	\| 0.9923 \| 22.59 \| 122 \| 2.2693 \|
	\| 1.0139 \| 22.96 \| 124 \| 2.3008 \|
	\| 0.9924 \| 23.33 \| 126 \| 2.3036 \|
	\| 1.0418 \| 23.7 \| 128 \| 2.3277 \|
	\| 1.0463 \| 24.07 \| 130 \| 2.3043 \|
	\| 1.0556 \| 24.44 \| 132 \| 2.3262 \|
	\| 0.9991 \| 24.81 \| 134 \| 2.3299 \|
	\| 0.96 \| 25.19 \| 136 \| 2.3481 \|
	\| 0.9677 \| 25.56 \| 138 \| 2.3458 \|
	\| 0.9107 \| 25.93 \| 140 \| 2.3607 \|
	\| 0.8962 \| 26.3 \| 142 \| 2.3644 \|
	\| 0.916 \| 26.67 \| 144 \| 2.3700 \|
	\| 0.9284 \| 27.04 \| 146 \| 2.3726 \|
	\| 0.99 \| 27.41 \| 148 \| 2.3860 \|
	\| 0.8308 \| 27.78 \| 150 \| 2.3918 \|
	\| 0.9459 \| 28.15 \| 152 \| 2.3971 \|
	\| 0.9283 \| 28.52 \| 154 \| 2.4030 \|
	\| 0.863 \| 28.89 \| 156 \| 2.4024 \|
	\| 0.9068 \| 29.26 \| 158 \| 2.4083 \|
	\| 0.8623 \| 29.63 \| 160 \| 2.4179 \|
	\| 0.8359 \| 30.0 \| 162 \| 2.4262 \|
	\| 0.953 \| 30.37 \| 164 \| 2.4281 \|
	\| 0.7937 \| 30.74 \| 166 \| 2.4381 \|
	\| 0.8274 \| 31.11 \| 168 \| 2.4255 \|
	\| 0.8862 \| 31.48 \| 170 \| 2.4330 \|
	\| 0.7913 \| 31.85 \| 172 \| 2.4511 \|
	\| 0.8436 \| 32.22 \| 174 \| 2.4522 \|
	\| 0.8519 \| 32.59 \| 176 \| 2.4413 \|
	\| 0.8089 \| 32.96 \| 178 \| 2.4371 \|
	\| 0.8876 \| 33.33 \| 180 \| 2.4434 \|
	\| 0.7836 \| 33.7 \| 182 \| 2.4532 \|
	\| 0.8232 \| 34.07 \| 184 \| 2.4566 \|
	\| 0.8299 \| 34.44 \| 186 \| 2.4582 \|
	\| 0.7977 \| 34.81 \| 188 \| 2.4553 \|
	\| 0.8635 \| 35.19 \| 190 \| 2.4522 \|
	\| 0.883 \| 35.56 \| 192 \| 2.4518 \|
	\| 0.8158 \| 35.93 \| 194 \| 2.4513 \|
	\| 0.8732 \| 36.3 \| 196 \| 2.4518 \|
	\| 0.8112 \| 36.67 \| 198 \| 2.4522 \|
	\| 0.7869 \| 37.04 \| 200 \| 2.4521 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: TheBloke/Llama-2-7B-fp16
	model-index:
	- name: Saiga_timelist_task200steps
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Saiga_timelist_task200steps

	This model is a fine-tuned version of [TheBloke/Llama-2-7B-fp16](https://huggingface.co/TheBloke/Llama-2-7B-fp16) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4521

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 10
	- total_train_batch_size: 20
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 200

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.2298 \| 0.37 \| 2 \| 2.2020 \|
	\| 2.0975 \| 0.74 \| 4 \| 2.1478 \|
	\| 2.0243 \| 1.11 \| 6 \| 2.1123 \|
	\| 1.988 \| 1.48 \| 8 \| 2.0857 \|
	\| 1.9585 \| 1.85 \| 10 \| 2.0692 \|
	\| 1.883 \| 2.22 \| 12 \| 2.0570 \|
	\| 1.9078 \| 2.59 \| 14 \| 2.0477 \|
	\| 1.9179 \| 2.96 \| 16 \| 2.0408 \|
	\| 1.8663 \| 3.33 \| 18 \| 2.0366 \|
	\| 1.8191 \| 3.7 \| 20 \| 2.0325 \|
	\| 1.8515 \| 4.07 \| 22 \| 2.0280 \|
	\| 1.8189 \| 4.44 \| 24 \| 2.0246 \|
	\| 1.8478 \| 4.81 \| 26 \| 2.0215 \|
	\| 1.7767 \| 5.19 \| 28 \| 2.0198 \|
	\| 1.7685 \| 5.56 \| 30 \| 2.0190 \|
	\| 1.7895 \| 5.93 \| 32 \| 2.0189 \|
	\| 1.7285 \| 6.3 \| 34 \| 2.0191 \|
	\| 1.7609 \| 6.67 \| 36 \| 2.0174 \|
	\| 1.7138 \| 7.04 \| 38 \| 2.0156 \|
	\| 1.7112 \| 7.41 \| 40 \| 2.0187 \|
	\| 1.7029 \| 7.78 \| 42 \| 2.0216 \|
	\| 1.6787 \| 8.15 \| 44 \| 2.0203 \|
	\| 1.646 \| 8.52 \| 46 \| 2.0243 \|
	\| 1.5996 \| 8.89 \| 48 \| 2.0294 \|
	\| 1.6838 \| 9.26 \| 50 \| 2.0280 \|
	\| 1.6057 \| 9.63 \| 52 \| 2.0254 \|
	\| 1.574 \| 10.0 \| 54 \| 2.0310 \|
	\| 1.51 \| 10.37 \| 56 \| 2.0547 \|
	\| 1.5951 \| 10.74 \| 58 \| 2.0420 \|
	\| 1.5455 \| 11.11 \| 60 \| 2.0350 \|
	\| 1.5424 \| 11.48 \| 62 \| 2.0612 \|
	\| 1.4933 \| 11.85 \| 64 \| 2.0652 \|
	\| 1.5766 \| 12.22 \| 66 \| 2.0537 \|
	\| 1.4453 \| 12.59 \| 68 \| 2.0732 \|
	\| 1.4683 \| 12.96 \| 70 \| 2.0763 \|
	\| 1.4734 \| 13.33 \| 72 \| 2.0805 \|
	\| 1.4314 \| 13.7 \| 74 \| 2.0908 \|
	\| 1.3921 \| 14.07 \| 76 \| 2.0815 \|
	\| 1.4099 \| 14.44 \| 78 \| 2.1134 \|
	\| 1.4389 \| 14.81 \| 80 \| 2.0955 \|
	\| 1.3114 \| 15.19 \| 82 \| 2.1153 \|
	\| 1.3093 \| 15.56 \| 84 \| 2.1303 \|
	\| 1.3984 \| 15.93 \| 86 \| 2.1246 \|
	\| 1.2831 \| 16.3 \| 88 \| 2.1564 \|
	\| 1.2971 \| 16.67 \| 90 \| 2.1284 \|
	\| 1.3052 \| 17.04 \| 92 \| 2.1608 \|
	\| 1.2421 \| 17.41 \| 94 \| 2.1556 \|
	\| 1.1835 \| 17.78 \| 96 \| 2.1734 \|
	\| 1.283 \| 18.15 \| 98 \| 2.1773 \|
	\| 1.2311 \| 18.52 \| 100 \| 2.1992 \|
	\| 1.2428 \| 18.89 \| 102 \| 2.1954 \|
	\| 1.1959 \| 19.26 \| 104 \| 2.2065 \|
	\| 1.2376 \| 19.63 \| 106 \| 2.2124 \|
	\| 1.0689 \| 20.0 \| 108 \| 2.2266 \|
	\| 1.1471 \| 20.37 \| 110 \| 2.2266 \|
	\| 1.0068 \| 20.74 \| 112 \| 2.2451 \|
	\| 1.161 \| 21.11 \| 114 \| 2.2501 \|
	\| 1.1252 \| 21.48 \| 116 \| 2.2579 \|
	\| 1.0683 \| 21.85 \| 118 \| 2.2595 \|
	\| 1.1279 \| 22.22 \| 120 \| 2.2904 \|
	\| 0.9923 \| 22.59 \| 122 \| 2.2693 \|
	\| 1.0139 \| 22.96 \| 124 \| 2.3008 \|
	\| 0.9924 \| 23.33 \| 126 \| 2.3036 \|
	\| 1.0418 \| 23.7 \| 128 \| 2.3277 \|
	\| 1.0463 \| 24.07 \| 130 \| 2.3043 \|
	\| 1.0556 \| 24.44 \| 132 \| 2.3262 \|
	\| 0.9991 \| 24.81 \| 134 \| 2.3299 \|
	\| 0.96 \| 25.19 \| 136 \| 2.3481 \|
	\| 0.9677 \| 25.56 \| 138 \| 2.3458 \|
	\| 0.9107 \| 25.93 \| 140 \| 2.3607 \|
	\| 0.8962 \| 26.3 \| 142 \| 2.3644 \|
	\| 0.916 \| 26.67 \| 144 \| 2.3700 \|
	\| 0.9284 \| 27.04 \| 146 \| 2.3726 \|
	\| 0.99 \| 27.41 \| 148 \| 2.3860 \|
	\| 0.8308 \| 27.78 \| 150 \| 2.3918 \|
	\| 0.9459 \| 28.15 \| 152 \| 2.3971 \|
	\| 0.9283 \| 28.52 \| 154 \| 2.4030 \|
	\| 0.863 \| 28.89 \| 156 \| 2.4024 \|
	\| 0.9068 \| 29.26 \| 158 \| 2.4083 \|
	\| 0.8623 \| 29.63 \| 160 \| 2.4179 \|
	\| 0.8359 \| 30.0 \| 162 \| 2.4262 \|
	\| 0.953 \| 30.37 \| 164 \| 2.4281 \|
	\| 0.7937 \| 30.74 \| 166 \| 2.4381 \|
	\| 0.8274 \| 31.11 \| 168 \| 2.4255 \|
	\| 0.8862 \| 31.48 \| 170 \| 2.4330 \|
	\| 0.7913 \| 31.85 \| 172 \| 2.4511 \|
	\| 0.8436 \| 32.22 \| 174 \| 2.4522 \|
	\| 0.8519 \| 32.59 \| 176 \| 2.4413 \|
	\| 0.8089 \| 32.96 \| 178 \| 2.4371 \|
	\| 0.8876 \| 33.33 \| 180 \| 2.4434 \|
	\| 0.7836 \| 33.7 \| 182 \| 2.4532 \|
	\| 0.8232 \| 34.07 \| 184 \| 2.4566 \|
	\| 0.8299 \| 34.44 \| 186 \| 2.4582 \|
	\| 0.7977 \| 34.81 \| 188 \| 2.4553 \|
	\| 0.8635 \| 35.19 \| 190 \| 2.4522 \|
	\| 0.883 \| 35.56 \| 192 \| 2.4518 \|
	\| 0.8158 \| 35.93 \| 194 \| 2.4513 \|
	\| 0.8732 \| 36.3 \| 196 \| 2.4518 \|
	\| 0.8112 \| 36.67 \| 198 \| 2.4522 \|
	\| 0.7869 \| 37.04 \| 200 \| 2.4521 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2