tyzhu
/

lmind_hotpot_train500_eval300_v1_recite_qa_gpt2-xl

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

lmind_hotpot_train500_eval300_v1_recite_qa_gpt2-xl / README.md

tyzhu's picture

End of training

65db12a verified over 1 year ago

|

history blame contribute delete

2.91 kB

	---
	license: mit
	base_model: gpt2-xl
	tags:
	- generated_from_trainer
	datasets:
	- tyzhu/lmind_hotpot_train500_eval300_v1_recite_qa
	metrics:
	- accuracy
	model-index:
	- name: lmind_hotpot_train500_eval300_v1_recite_qa_gpt2-xl
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: tyzhu/lmind_hotpot_train500_eval300_v1_recite_qa
	type: tyzhu/lmind_hotpot_train500_eval300_v1_recite_qa
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.7413294517224648
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lmind_hotpot_train500_eval300_v1_recite_qa_gpt2-xl

	This model is a fine-tuned version of [gpt2-xl](https://huggingface.co/gpt2-xl) on the tyzhu/lmind_hotpot_train500_eval300_v1_recite_qa dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4030
	- Accuracy: 0.7413

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- num_epochs: 10.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 2.0739 \| 0.5 \| 66 \| 1.8405 \| 0.6206 \|
	\| 1.8806 \| 1.01 \| 132 \| 1.5749 \| 0.6365 \|
	\| 1.3619 \| 1.51 \| 198 \| 1.3425 \| 0.6533 \|
	\| 1.283 \| 2.02 \| 264 \| 1.1253 \| 0.6685 \|
	\| 0.8433 \| 2.52 \| 330 \| 0.9735 \| 0.6825 \|
	\| 0.7629 \| 3.02 \| 396 \| 0.7874 \| 0.6982 \|
	\| 0.5058 \| 3.53 \| 462 \| 0.6921 \| 0.7086 \|
	\| 0.4593 \| 4.03 \| 528 \| 0.5641 \| 0.7197 \|
	\| 0.3064 \| 4.53 \| 594 \| 0.5348 \| 0.7245 \|
	\| 0.2967 \| 5.04 \| 660 \| 0.4770 \| 0.7304 \|
	\| 0.2167 \| 5.54 \| 726 \| 0.4582 \| 0.7324 \|
	\| 0.2157 \| 6.05 \| 792 \| 0.4308 \| 0.7358 \|
	\| 0.1597 \| 6.55 \| 858 \| 0.4301 \| 0.7373 \|
	\| 0.1481 \| 7.05 \| 924 \| 0.4224 \| 0.7385 \|
	\| 0.1293 \| 7.56 \| 990 \| 0.4125 \| 0.7394 \|
	\| 0.125 \| 8.06 \| 1056 \| 0.4122 \| 0.7400 \|
	\| 0.1139 \| 8.56 \| 1122 \| 0.4069 \| 0.7407 \|
	\| 0.1141 \| 9.07 \| 1188 \| 0.4082 \| 0.7409 \|
	\| 0.0994 \| 9.57 \| 1254 \| 0.4065 \| 0.7412 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.5
	- Tokenizers 0.14.1