|
--- |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- squad_v2 |
|
base_model: microsoft/deberta-v3-large |
|
model-index: |
|
- name: deberta-v3-large-finetuned-squadv2 |
|
results: |
|
- task: |
|
type: question-answering |
|
name: Extractive Question Answering |
|
dataset: |
|
name: SQuAD2.0 |
|
type: squad_v2 |
|
split: validation[:11873] |
|
metrics: |
|
- type: exact |
|
value: 88.69704371262529 |
|
name: eval_exact |
|
- type: f1 |
|
value: 91.51550564529175 |
|
name: eval_f1 |
|
- type: HasAns_exact |
|
value: 83.70445344129554 |
|
name: HasAns_exact |
|
- type: HasAns_f1 |
|
value: 89.34945994037624 |
|
name: HasAns_f1 |
|
- type: HasAns_total |
|
value: 5928 |
|
name: HasAns_total |
|
- type: NoAns_exact |
|
value: 93.6753574432296 |
|
name: NoAns_exact |
|
- type: NoAns_f1 |
|
value: 93.6753574432296 |
|
name: NoAns_f1 |
|
- type: NoAns_total |
|
value: 5945 |
|
name: NoAns_total |
|
--- |
|
# deberta-v3-large-finetuned-squadv2 |
|
This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) fine-tuned on the SQuAD version 2.0 dataset. |
|
Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours. |
|
|
|
## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al. |
|
- 'EM' : 89.0 |
|
- 'F1' : 91.5 |
|
|
|
## Results calculated with: |
|
```python |
|
metrics = evaluate.load("squad_v2") |
|
squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references) |
|
``` |
|
## for this fine-tuning: |
|
- 'exact' : 88.70, |
|
- 'f1' : 91.52, |
|
- 'total' : 11873, |
|
- 'HasAns_exact' : 83.70, |
|
- 'HasAns_f1' : 89.35, |
|
- 'HasAns_total' : 5928, |
|
- 'NoAns_exact' : 93.68, |
|
- 'NoAns_f1' : 93.68, |
|
- 'NoAns_total' : 5945, |
|
- 'best_exact' : 88.70, |
|
- 'best_exact_thresh' : 0.0, |
|
- 'best_f1' : 91.52, |
|
- 'best_f1_thresh' : 0.0} |
|
|
|
## Model description |
|
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa |
|
|
|
## Intended uses |
|
Extractive question answering on a given context |
|
|
|
### Fine-tuning hyperparameters |
|
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning: |
|
- learning_rate : 1e-05 |
|
- train_batch_size : 8 |
|
- eval_batch_size : 8 |
|
- seed : 42 |
|
- gradient_accumulation_steps : 8 |
|
- total_train_batch_size : 64 |
|
- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06 |
|
- lr_scheduler_type : linear |
|
- lr_scheduler_warmup_steps : 1000 |
|
- training_steps : 5200 |
|
|
|
### Framework versions |
|
- Transformers : 4.35.0.dev0 |
|
- Pytorch : 2.1.0+cu121 |
|
- Datasets : 2.14.5 |
|
- Tokenizers : 0.14.0 |
|
|
|
### System |
|
- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM |
|
- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime) |
|
- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35 |
|
- GPU : NVIDIA TITAN RTX - 24GB Memory |
|
- CUDA runtime version : 12.1.105 |
|
- Nvidia driver version : 535.113.01 |
|
|
|
### Fine-tuning (Training) results before/after the best model (Step 3620) |
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 0.5323 | 1.72 | 3500 | 0.5860 | |
|
| 0.5129 | 1.73 | 3520 | 0.5656 | |
|
| 0.5441 | 1.74 | 3540 | 0.5642 | |
|
| 0.5624 | 1.75 | 3560 | 0.5873 | |
|
| 0.4645 | 1.76 | 3580 | 0.5891 | |
|
| 0.5577 | 1.77 | 3600 | 0.5816 | |
|
| 0.5199 | 1.78 | 3620 | 0.5579 | |
|
| 0.5061 | 1.79 | 3640 | 0.5837 | |
|
| 0.484 | 1.79 | 3660 | 0.5721 | |
|
| 0.5095 | 1.8 | 3680 | 0.5821 | |
|
| 0.5342 | 1.81 | 3700 | 0.5602 | |