ahotrod
/

deberta-v3-large-finetuned-squadv2

Question Answering

Generated from Trainer

Model card Files Files and versions Community

deberta-v3-large-finetuned-squadv2 / README.md

ahotrod's picture

Update metadata with huggingface_hub

0b06305 over 1 year ago

|

history blame contribute delete

3.67 kB

	---
	license: mit
	tags:
	- generated_from_trainer
	datasets:
	- squad_v2
	base_model: microsoft/deberta-v3-large
	model-index:
	- name: deberta-v3-large-finetuned-squadv2
	results:
	- task:
	type: question-answering
	name: Extractive Question Answering
	dataset:
	name: SQuAD2.0
	type: squad_v2
	split: validation[:11873]
	metrics:
	- type: exact
	value: 88.69704371262529
	name: eval_exact
	- type: f1
	value: 91.51550564529175
	name: eval_f1
	- type: HasAns_exact
	value: 83.70445344129554
	name: HasAns_exact
	- type: HasAns_f1
	value: 89.34945994037624
	name: HasAns_f1
	- type: HasAns_total
	value: 5928
	name: HasAns_total
	- type: NoAns_exact
	value: 93.6753574432296
	name: NoAns_exact
	- type: NoAns_f1
	value: 93.6753574432296
	name: NoAns_f1
	- type: NoAns_total
	value: 5945
	name: NoAns_total
	---
	# deberta-v3-large-finetuned-squadv2
	This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) fine-tuned on the SQuAD version 2.0 dataset.
	Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.

	## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
	- 'EM' : 89.0
	- 'F1' : 91.5

	## Results calculated with:
	```python
	metrics = evaluate.load("squad_v2")
	squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references)
	```
	## for this fine-tuning:
	- 'exact' : 88.70,
	- 'f1' : 91.52,
	- 'total' : 11873,
	- 'HasAns_exact' : 83.70,
	- 'HasAns_f1' : 89.35,
	- 'HasAns_total' : 5928,
	- 'NoAns_exact' : 93.68,
	- 'NoAns_f1' : 93.68,
	- 'NoAns_total' : 5945,
	- 'best_exact' : 88.70,
	- 'best_exact_thresh' : 0.0,
	- 'best_f1' : 91.52,
	- 'best_f1_thresh' : 0.0}

	## Model description
	For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa

	## Intended uses
	Extractive question answering on a given context

	### Fine-tuning hyperparameters
	The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
	- learning_rate : 1e-05
	- train_batch_size : 8
	- eval_batch_size : 8
	- seed : 42
	- gradient_accumulation_steps : 8
	- total_train_batch_size : 64
	- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
	- lr_scheduler_type : linear
	- lr_scheduler_warmup_steps : 1000
	- training_steps : 5200

	### Framework versions
	- Transformers : 4.35.0.dev0
	- Pytorch : 2.1.0+cu121
	- Datasets : 2.14.5
	- Tokenizers : 0.14.0

	### System
	- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
	- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
	- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
	- GPU : NVIDIA TITAN RTX - 24GB Memory
	- CUDA runtime version : 12.1.105
	- Nvidia driver version : 535.113.01

	### Fine-tuning (Training) results before/after the best model (Step 3620)
	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.5323 \| 1.72 \| 3500 \| 0.5860 \|
	\| 0.5129 \| 1.73 \| 3520 \| 0.5656 \|
	\| 0.5441 \| 1.74 \| 3540 \| 0.5642 \|
	\| 0.5624 \| 1.75 \| 3560 \| 0.5873 \|
	\| 0.4645 \| 1.76 \| 3580 \| 0.5891 \|
	\| 0.5577 \| 1.77 \| 3600 \| 0.5816 \|
	\| 0.5199 \| 1.78 \| 3620 \| 0.5579 \|
	\| 0.5061 \| 1.79 \| 3640 \| 0.5837 \|
	\| 0.484 \| 1.79 \| 3660 \| 0.5721 \|
	\| 0.5095 \| 1.8 \| 3680 \| 0.5821 \|
	\| 0.5342 \| 1.81 \| 3700 \| 0.5602 \|