working / README.md

mohit19906

mohit19906/mistral-7b-Ins-IntentAndEntity

5e77180 verified 7 months ago

3.88 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: TheBloke/Mistral-7B-Instruct-v0.2-GPTQ
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.6079

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 4.3979 \| 0.96 \| 6 \| 3.3561 \|
	\| 2.837 \| 1.92 \| 12 \| 2.2656 \|
	\| 1.9777 \| 2.88 \| 18 \| 1.7212 \|
	\| 1.3641 \| 4.0 \| 25 \| 1.4591 \|
	\| 1.3384 \| 4.96 \| 31 \| 1.2543 \|
	\| 1.1314 \| 5.92 \| 37 \| 1.1326 \|
	\| 0.9904 \| 6.88 \| 43 \| 1.0707 \|
	\| 0.7908 \| 8.0 \| 50 \| 1.0784 \|
	\| 0.8779 \| 8.96 \| 56 \| 1.0891 \|
	\| 0.8415 \| 9.92 \| 62 \| 1.1026 \|
	\| 0.8044 \| 10.88 \| 68 \| 1.1326 \|
	\| 0.6611 \| 12.0 \| 75 \| 1.1425 \|
	\| 0.7385 \| 12.96 \| 81 \| 1.2161 \|
	\| 0.7071 \| 13.92 \| 87 \| 1.2182 \|
	\| 0.6841 \| 14.88 \| 93 \| 1.2865 \|
	\| 0.5671 \| 16.0 \| 100 \| 1.3092 \|
	\| 0.6442 \| 16.96 \| 106 \| 1.3813 \|
	\| 0.629 \| 17.92 \| 112 \| 1.3295 \|
	\| 0.6197 \| 18.88 \| 118 \| 1.4387 \|
	\| 0.522 \| 20.0 \| 125 \| 1.3785 \|
	\| 0.6013 \| 20.96 \| 131 \| 1.4355 \|
	\| 0.5928 \| 21.92 \| 137 \| 1.4321 \|
	\| 0.5901 \| 22.88 \| 143 \| 1.4711 \|
	\| 0.5015 \| 24.0 \| 150 \| 1.4916 \|
	\| 0.5817 \| 24.96 \| 156 \| 1.5001 \|
	\| 0.578 \| 25.92 \| 162 \| 1.5077 \|
	\| 0.5758 \| 26.88 \| 168 \| 1.5173 \|
	\| 0.4914 \| 28.0 \| 175 \| 1.4935 \|
	\| 0.5732 \| 28.96 \| 181 \| 1.5161 \|
	\| 0.5715 \| 29.92 \| 187 \| 1.5131 \|
	\| 0.5696 \| 30.88 \| 193 \| 1.5400 \|
	\| 0.4861 \| 32.0 \| 200 \| 1.5338 \|
	\| 0.5666 \| 32.96 \| 206 \| 1.5474 \|
	\| 0.5643 \| 33.92 \| 212 \| 1.5519 \|
	\| 0.5643 \| 34.88 \| 218 \| 1.5710 \|
	\| 0.4819 \| 36.0 \| 225 \| 1.5723 \|
	\| 0.5607 \| 36.96 \| 231 \| 1.5749 \|
	\| 0.5609 \| 37.92 \| 237 \| 1.5677 \|
	\| 0.5598 \| 38.88 \| 243 \| 1.5853 \|
	\| 0.4793 \| 40.0 \| 250 \| 1.5951 \|
	\| 0.5587 \| 40.96 \| 256 \| 1.5850 \|
	\| 0.5577 \| 41.92 \| 262 \| 1.5904 \|
	\| 0.5568 \| 42.88 \| 268 \| 1.5913 \|
	\| 0.477 \| 44.0 \| 275 \| 1.5959 \|
	\| 0.5553 \| 44.96 \| 281 \| 1.6042 \|
	\| 0.5556 \| 45.92 \| 287 \| 1.6082 \|
	\| 0.5549 \| 46.88 \| 293 \| 1.6075 \|
	\| 0.4749 \| 48.0 \| 300 \| 1.6079 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.1.2
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: TheBloke/Mistral-7B-Instruct-v0.2-GPTQ
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.6079

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 4.3979 \| 0.96 \| 6 \| 3.3561 \|
	\| 2.837 \| 1.92 \| 12 \| 2.2656 \|
	\| 1.9777 \| 2.88 \| 18 \| 1.7212 \|
	\| 1.3641 \| 4.0 \| 25 \| 1.4591 \|
	\| 1.3384 \| 4.96 \| 31 \| 1.2543 \|
	\| 1.1314 \| 5.92 \| 37 \| 1.1326 \|
	\| 0.9904 \| 6.88 \| 43 \| 1.0707 \|
	\| 0.7908 \| 8.0 \| 50 \| 1.0784 \|
	\| 0.8779 \| 8.96 \| 56 \| 1.0891 \|
	\| 0.8415 \| 9.92 \| 62 \| 1.1026 \|
	\| 0.8044 \| 10.88 \| 68 \| 1.1326 \|
	\| 0.6611 \| 12.0 \| 75 \| 1.1425 \|
	\| 0.7385 \| 12.96 \| 81 \| 1.2161 \|
	\| 0.7071 \| 13.92 \| 87 \| 1.2182 \|
	\| 0.6841 \| 14.88 \| 93 \| 1.2865 \|
	\| 0.5671 \| 16.0 \| 100 \| 1.3092 \|
	\| 0.6442 \| 16.96 \| 106 \| 1.3813 \|
	\| 0.629 \| 17.92 \| 112 \| 1.3295 \|
	\| 0.6197 \| 18.88 \| 118 \| 1.4387 \|
	\| 0.522 \| 20.0 \| 125 \| 1.3785 \|
	\| 0.6013 \| 20.96 \| 131 \| 1.4355 \|
	\| 0.5928 \| 21.92 \| 137 \| 1.4321 \|
	\| 0.5901 \| 22.88 \| 143 \| 1.4711 \|
	\| 0.5015 \| 24.0 \| 150 \| 1.4916 \|
	\| 0.5817 \| 24.96 \| 156 \| 1.5001 \|
	\| 0.578 \| 25.92 \| 162 \| 1.5077 \|
	\| 0.5758 \| 26.88 \| 168 \| 1.5173 \|
	\| 0.4914 \| 28.0 \| 175 \| 1.4935 \|
	\| 0.5732 \| 28.96 \| 181 \| 1.5161 \|
	\| 0.5715 \| 29.92 \| 187 \| 1.5131 \|
	\| 0.5696 \| 30.88 \| 193 \| 1.5400 \|
	\| 0.4861 \| 32.0 \| 200 \| 1.5338 \|
	\| 0.5666 \| 32.96 \| 206 \| 1.5474 \|
	\| 0.5643 \| 33.92 \| 212 \| 1.5519 \|
	\| 0.5643 \| 34.88 \| 218 \| 1.5710 \|
	\| 0.4819 \| 36.0 \| 225 \| 1.5723 \|
	\| 0.5607 \| 36.96 \| 231 \| 1.5749 \|
	\| 0.5609 \| 37.92 \| 237 \| 1.5677 \|
	\| 0.5598 \| 38.88 \| 243 \| 1.5853 \|
	\| 0.4793 \| 40.0 \| 250 \| 1.5951 \|
	\| 0.5587 \| 40.96 \| 256 \| 1.5850 \|
	\| 0.5577 \| 41.92 \| 262 \| 1.5904 \|
	\| 0.5568 \| 42.88 \| 268 \| 1.5913 \|
	\| 0.477 \| 44.0 \| 275 \| 1.5959 \|
	\| 0.5553 \| 44.96 \| 281 \| 1.6042 \|
	\| 0.5556 \| 45.92 \| 287 \| 1.6082 \|
	\| 0.5549 \| 46.88 \| 293 \| 1.6075 \|
	\| 0.4749 \| 48.0 \| 300 \| 1.6079 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.1.2
	- Datasets 2.18.0
	- Tokenizers 0.15.2