working / README.md

mohit19906

mohit19906/falcon-7b-trained-model-V3

17c6b50 verified over 1 year ago

3.82 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: tiiuae/falcon-7b
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6524

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.4729 \| 0.95 \| 5 \| 2.1690 \|
	\| 2.3701 \| 1.9 \| 10 \| 2.0002 \|
	\| 2.1498 \| 2.86 \| 15 \| 1.8282 \|
	\| 1.5589 \| 4.0 \| 21 \| 1.6084 \|
	\| 1.7043 \| 4.95 \| 26 \| 1.4217 \|
	\| 1.315 \| 5.9 \| 31 \| 1.2678 \|
	\| 1.1273 \| 6.86 \| 36 \| 1.1439 \|
	\| 0.7666 \| 8.0 \| 42 \| 1.0079 \|
	\| 0.762 \| 8.95 \| 47 \| 0.9406 \|
	\| 0.6241 \| 9.9 \| 52 \| 0.8969 \|
	\| 0.5211 \| 10.86 \| 57 \| 0.8128 \|
	\| 0.3836 \| 12.0 \| 63 \| 0.7788 \|
	\| 0.396 \| 12.95 \| 68 \| 0.7358 \|
	\| 0.3255 \| 13.9 \| 73 \| 0.7152 \|
	\| 0.2863 \| 14.86 \| 78 \| 0.6880 \|
	\| 0.2405 \| 16.0 \| 84 \| 0.6383 \|
	\| 0.2471 \| 16.95 \| 89 \| 0.6414 \|
	\| 0.2214 \| 17.9 \| 94 \| 0.6250 \|
	\| 0.2074 \| 18.86 \| 99 \| 0.6280 \|
	\| 0.1682 \| 20.0 \| 105 \| 0.6317 \|
	\| 0.1984 \| 20.95 \| 110 \| 0.6157 \|
	\| 0.1817 \| 21.9 \| 115 \| 0.6330 \|
	\| 0.1723 \| 22.86 \| 120 \| 0.6154 \|
	\| 0.1393 \| 24.0 \| 126 \| 0.6111 \|
	\| 0.1615 \| 24.95 \| 131 \| 0.6330 \|
	\| 0.177 \| 25.9 \| 136 \| 0.6185 \|
	\| 0.1462 \| 26.86 \| 141 \| 0.6225 \|
	\| 0.1204 \| 28.0 \| 147 \| 0.6225 \|
	\| 0.1494 \| 28.95 \| 152 \| 0.6336 \|
	\| 0.1463 \| 29.9 \| 157 \| 0.6345 \|
	\| 0.1507 \| 30.86 \| 162 \| 0.6250 \|
	\| 0.1163 \| 32.0 \| 168 \| 0.6178 \|
	\| 0.1556 \| 32.95 \| 173 \| 0.6377 \|
	\| 0.1465 \| 33.9 \| 178 \| 0.6392 \|
	\| 0.1391 \| 34.86 \| 183 \| 0.6459 \|
	\| 0.113 \| 36.0 \| 189 \| 0.6572 \|
	\| 0.1386 \| 36.95 \| 194 \| 0.6436 \|
	\| 0.1497 \| 37.9 \| 199 \| 0.6284 \|
	\| 0.1567 \| 38.86 \| 204 \| 0.6434 \|
	\| 0.1097 \| 40.0 \| 210 \| 0.6492 \|
	\| 0.1276 \| 40.95 \| 215 \| 0.6451 \|
	\| 0.1423 \| 41.9 \| 220 \| 0.6395 \|
	\| 0.126 \| 42.86 \| 225 \| 0.6457 \|
	\| 0.1064 \| 44.0 \| 231 \| 0.6517 \|
	\| 0.131 \| 44.95 \| 236 \| 0.6525 \|
	\| 0.1224 \| 45.9 \| 241 \| 0.6517 \|
	\| 0.1399 \| 46.86 \| 246 \| 0.6521 \|
	\| 0.0991 \| 47.62 \| 250 \| 0.6524 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.38.2
	- Pytorch 2.1.2
	- Datasets 2.1.0
	- Tokenizers 0.15.2

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: tiiuae/falcon-7b
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6524

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.4729 \| 0.95 \| 5 \| 2.1690 \|
	\| 2.3701 \| 1.9 \| 10 \| 2.0002 \|
	\| 2.1498 \| 2.86 \| 15 \| 1.8282 \|
	\| 1.5589 \| 4.0 \| 21 \| 1.6084 \|
	\| 1.7043 \| 4.95 \| 26 \| 1.4217 \|
	\| 1.315 \| 5.9 \| 31 \| 1.2678 \|
	\| 1.1273 \| 6.86 \| 36 \| 1.1439 \|
	\| 0.7666 \| 8.0 \| 42 \| 1.0079 \|
	\| 0.762 \| 8.95 \| 47 \| 0.9406 \|
	\| 0.6241 \| 9.9 \| 52 \| 0.8969 \|
	\| 0.5211 \| 10.86 \| 57 \| 0.8128 \|
	\| 0.3836 \| 12.0 \| 63 \| 0.7788 \|
	\| 0.396 \| 12.95 \| 68 \| 0.7358 \|
	\| 0.3255 \| 13.9 \| 73 \| 0.7152 \|
	\| 0.2863 \| 14.86 \| 78 \| 0.6880 \|
	\| 0.2405 \| 16.0 \| 84 \| 0.6383 \|
	\| 0.2471 \| 16.95 \| 89 \| 0.6414 \|
	\| 0.2214 \| 17.9 \| 94 \| 0.6250 \|
	\| 0.2074 \| 18.86 \| 99 \| 0.6280 \|
	\| 0.1682 \| 20.0 \| 105 \| 0.6317 \|
	\| 0.1984 \| 20.95 \| 110 \| 0.6157 \|
	\| 0.1817 \| 21.9 \| 115 \| 0.6330 \|
	\| 0.1723 \| 22.86 \| 120 \| 0.6154 \|
	\| 0.1393 \| 24.0 \| 126 \| 0.6111 \|
	\| 0.1615 \| 24.95 \| 131 \| 0.6330 \|
	\| 0.177 \| 25.9 \| 136 \| 0.6185 \|
	\| 0.1462 \| 26.86 \| 141 \| 0.6225 \|
	\| 0.1204 \| 28.0 \| 147 \| 0.6225 \|
	\| 0.1494 \| 28.95 \| 152 \| 0.6336 \|
	\| 0.1463 \| 29.9 \| 157 \| 0.6345 \|
	\| 0.1507 \| 30.86 \| 162 \| 0.6250 \|
	\| 0.1163 \| 32.0 \| 168 \| 0.6178 \|
	\| 0.1556 \| 32.95 \| 173 \| 0.6377 \|
	\| 0.1465 \| 33.9 \| 178 \| 0.6392 \|
	\| 0.1391 \| 34.86 \| 183 \| 0.6459 \|
	\| 0.113 \| 36.0 \| 189 \| 0.6572 \|
	\| 0.1386 \| 36.95 \| 194 \| 0.6436 \|
	\| 0.1497 \| 37.9 \| 199 \| 0.6284 \|
	\| 0.1567 \| 38.86 \| 204 \| 0.6434 \|
	\| 0.1097 \| 40.0 \| 210 \| 0.6492 \|
	\| 0.1276 \| 40.95 \| 215 \| 0.6451 \|
	\| 0.1423 \| 41.9 \| 220 \| 0.6395 \|
	\| 0.126 \| 42.86 \| 225 \| 0.6457 \|
	\| 0.1064 \| 44.0 \| 231 \| 0.6517 \|
	\| 0.131 \| 44.95 \| 236 \| 0.6525 \|
	\| 0.1224 \| 45.9 \| 241 \| 0.6517 \|
	\| 0.1399 \| 46.86 \| 246 \| 0.6521 \|
	\| 0.0991 \| 47.62 \| 250 \| 0.6524 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.38.2
	- Pytorch 2.1.2
	- Datasets 2.1.0
	- Tokenizers 0.15.2