Update README.md
Browse files
README.md
CHANGED
@@ -9,33 +9,36 @@ model-index:
|
|
9 |
- name: deberta-v3-large-finetuned-squadv2
|
10 |
results: []
|
11 |
---
|
12 |
-
|
13 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
-
should probably proofread and complete it, then remove this comment. -->
|
15 |
-
|
16 |
# deberta-v3-large-finetuned-squadv2
|
17 |
-
|
18 |
This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
|
19 |
-
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 0.5579
|
21 |
-
|
22 |
-
## Model description
|
23 |
-
|
24 |
-
More information needed
|
25 |
|
26 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
|
29 |
-
|
30 |
-
## Training and evaluation data
|
31 |
-
|
32 |
-
More information needed
|
33 |
|
34 |
-
##
|
|
|
35 |
|
36 |
### Training hyperparameters
|
37 |
-
|
38 |
-
The following hyperparameters were used during training:
|
39 |
- learning_rate: 1e-05
|
40 |
- train_batch_size: 8
|
41 |
- eval_batch_size: 8
|
@@ -47,25 +50,23 @@ The following hyperparameters were used during training:
|
|
47 |
- lr_scheduler_warmup_steps: 1000
|
48 |
- training_steps: 5200
|
49 |
|
50 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
|
|
52 |
| Training Loss | Epoch | Step | Validation Loss |
|
53 |
|:-------------:|:-----:|:----:|:---------------:|
|
54 |
-
| 0.5293 | 1.57 | 3200 | 0.5739 |
|
55 |
-
| 0.5106 | 1.58 | 3220 | 0.5783 |
|
56 |
-
| 0.5338 | 1.59 | 3240 | 0.5718 |
|
57 |
-
| 0.5128 | 1.6 | 3260 | 0.5827 |
|
58 |
-
| 0.5205 | 1.61 | 3280 | 0.6045 |
|
59 |
-
| 0.5114 | 1.62 | 3300 | 0.5880 |
|
60 |
-
| 0.5072 | 1.63 | 3320 | 0.5788 |
|
61 |
-
| 0.5512 | 1.64 | 3340 | 0.5863 |
|
62 |
-
| 0.4723 | 1.65 | 3360 | 0.5898 |
|
63 |
-
| 0.5011 | 1.66 | 3380 | 0.5917 |
|
64 |
-
| 0.5419 | 1.67 | 3400 | 0.6027 |
|
65 |
-
| 0.5425 | 1.68 | 3420 | 0.5699 |
|
66 |
-
| 0.5703 | 1.69 | 3440 | 0.5897 |
|
67 |
-
| 0.4646 | 1.7 | 3460 | 0.5917 |
|
68 |
-
| 0.4652 | 1.71 | 3480 | 0.5745 |
|
69 |
| 0.5323 | 1.72 | 3500 | 0.5860 |
|
70 |
| 0.5129 | 1.73 | 3520 | 0.5656 |
|
71 |
| 0.5441 | 1.74 | 3540 | 0.5642 |
|
@@ -76,87 +77,4 @@ The following hyperparameters were used during training:
|
|
76 |
| 0.5061 | 1.79 | 3640 | 0.5837 |
|
77 |
| 0.484 | 1.79 | 3660 | 0.5721 |
|
78 |
| 0.5095 | 1.8 | 3680 | 0.5821 |
|
79 |
-
| 0.5342 | 1.81 | 3700 | 0.5602 |
|
80 |
-
| 0.5435 | 1.82 | 3720 | 0.5911 |
|
81 |
-
| 0.5288 | 1.83 | 3740 | 0.5647 |
|
82 |
-
| 0.5476 | 1.84 | 3760 | 0.5733 |
|
83 |
-
| 0.5199 | 1.85 | 3780 | 0.5675 |
|
84 |
-
| 0.5067 | 1.86 | 3800 | 0.5839 |
|
85 |
-
| 0.5418 | 1.87 | 3820 | 0.5757 |
|
86 |
-
| 0.4965 | 1.88 | 3840 | 0.5764 |
|
87 |
-
| 0.5273 | 1.89 | 3860 | 0.5906 |
|
88 |
-
| 0.5808 | 1.9 | 3880 | 0.5762 |
|
89 |
-
| 0.5161 | 1.91 | 3900 | 0.5612 |
|
90 |
-
| 0.4863 | 1.92 | 3920 | 0.5804 |
|
91 |
-
| 0.4827 | 1.93 | 3940 | 0.5841 |
|
92 |
-
| 0.4643 | 1.94 | 3960 | 0.5822 |
|
93 |
-
| 0.5029 | 1.95 | 3980 | 0.6052 |
|
94 |
-
| 0.509 | 1.96 | 4000 | 0.5800 |
|
95 |
-
| 0.5382 | 1.97 | 4020 | 0.5645 |
|
96 |
-
| 0.469 | 1.98 | 4040 | 0.5685 |
|
97 |
-
| 0.5032 | 1.99 | 4060 | 0.5779 |
|
98 |
-
| 0.5171 | 2.0 | 4080 | 0.5686 |
|
99 |
-
| 0.3938 | 2.01 | 4100 | 0.5889 |
|
100 |
-
| 0.4321 | 2.02 | 4120 | 0.6039 |
|
101 |
-
| 0.4185 | 2.03 | 4140 | 0.5996 |
|
102 |
-
| 0.4782 | 2.04 | 4160 | 0.5800 |
|
103 |
-
| 0.424 | 2.05 | 4180 | 0.6374 |
|
104 |
-
| 0.3766 | 2.06 | 4200 | 0.6096 |
|
105 |
-
| 0.415 | 2.07 | 4220 | 0.6221 |
|
106 |
-
| 0.4352 | 2.08 | 4240 | 0.6150 |
|
107 |
-
| 0.4336 | 2.09 | 4260 | 0.6055 |
|
108 |
-
| 0.4289 | 2.1 | 4280 | 0.6138 |
|
109 |
-
| 0.4433 | 2.11 | 4300 | 0.5946 |
|
110 |
-
| 0.4478 | 2.12 | 4320 | 0.6118 |
|
111 |
-
| 0.4787 | 2.13 | 4340 | 0.5969 |
|
112 |
-
| 0.4432 | 2.14 | 4360 | 0.6048 |
|
113 |
-
| 0.4319 | 2.15 | 4380 | 0.5948 |
|
114 |
-
| 0.3939 | 2.16 | 4400 | 0.6116 |
|
115 |
-
| 0.3921 | 2.17 | 4420 | 0.6082 |
|
116 |
-
| 0.4381 | 2.18 | 4440 | 0.6282 |
|
117 |
-
| 0.4461 | 2.19 | 4460 | 0.6084 |
|
118 |
-
| 0.4012 | 2.2 | 4480 | 0.6092 |
|
119 |
-
| 0.3849 | 2.21 | 4500 | 0.6152 |
|
120 |
-
| 0.4178 | 2.22 | 4520 | 0.6004 |
|
121 |
-
| 0.4163 | 2.23 | 4540 | 0.6059 |
|
122 |
-
| 0.4006 | 2.24 | 4560 | 0.6115 |
|
123 |
-
| 0.4225 | 2.25 | 4580 | 0.6130 |
|
124 |
-
| 0.4008 | 2.26 | 4600 | 0.6095 |
|
125 |
-
| 0.4706 | 2.27 | 4620 | 0.6136 |
|
126 |
-
| 0.3902 | 2.28 | 4640 | 0.6103 |
|
127 |
-
| 0.4048 | 2.29 | 4660 | 0.6085 |
|
128 |
-
| 0.4411 | 2.3 | 4680 | 0.6139 |
|
129 |
-
| 0.403 | 2.31 | 4700 | 0.6047 |
|
130 |
-
| 0.4799 | 2.31 | 4720 | 0.6043 |
|
131 |
-
| 0.4316 | 2.32 | 4740 | 0.5960 |
|
132 |
-
| 0.4198 | 2.33 | 4760 | 0.6031 |
|
133 |
-
| 0.4254 | 2.34 | 4780 | 0.6033 |
|
134 |
-
| 0.387 | 2.35 | 4800 | 0.6120 |
|
135 |
-
| 0.3882 | 2.36 | 4820 | 0.6128 |
|
136 |
-
| 0.4307 | 2.37 | 4840 | 0.6150 |
|
137 |
-
| 0.434 | 2.38 | 4860 | 0.6077 |
|
138 |
-
| 0.4225 | 2.39 | 4880 | 0.6071 |
|
139 |
-
| 0.4134 | 2.4 | 4900 | 0.6036 |
|
140 |
-
| 0.3846 | 2.41 | 4920 | 0.6124 |
|
141 |
-
| 0.3943 | 2.42 | 4940 | 0.6291 |
|
142 |
-
| 0.4455 | 2.43 | 4960 | 0.6185 |
|
143 |
-
| 0.4104 | 2.44 | 4980 | 0.6064 |
|
144 |
-
| 0.4158 | 2.45 | 5000 | 0.6095 |
|
145 |
-
| 0.4135 | 2.46 | 5020 | 0.6155 |
|
146 |
-
| 0.3789 | 2.47 | 5040 | 0.6209 |
|
147 |
-
| 0.418 | 2.48 | 5060 | 0.6106 |
|
148 |
-
| 0.3931 | 2.49 | 5080 | 0.6047 |
|
149 |
-
| 0.4289 | 2.5 | 5100 | 0.6055 |
|
150 |
-
| 0.4051 | 2.51 | 5120 | 0.6084 |
|
151 |
-
| 0.4217 | 2.52 | 5140 | 0.6118 |
|
152 |
-
| 0.3843 | 2.53 | 5160 | 0.6139 |
|
153 |
-
| 0.4435 | 2.54 | 5180 | 0.6126 |
|
154 |
-
| 0.4274 | 2.55 | 5200 | 0.6120 |
|
155 |
-
|
156 |
-
|
157 |
-
### Framework versions
|
158 |
-
|
159 |
-
- Transformers 4.35.0.dev0
|
160 |
-
- Pytorch 2.1.0+cu121
|
161 |
-
- Datasets 2.14.5
|
162 |
-
- Tokenizers 0.14.0
|
|
|
9 |
- name: deberta-v3-large-finetuned-squadv2
|
10 |
results: []
|
11 |
---
|
|
|
|
|
|
|
|
|
12 |
# deberta-v3-large-finetuned-squadv2
|
|
|
13 |
This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
+
## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
|
16 |
+
- 'EM': 89.0
|
17 |
+
- 'F1': 91.5
|
18 |
+
|
19 |
+
## Results from this fine-tuning:
|
20 |
+
- 'exact': 88.70,
|
21 |
+
- 'f1': 91.52,
|
22 |
+
- 'total': 11873,
|
23 |
+
- 'HasAns_exact': 83.70,
|
24 |
+
- 'HasAns_f1': 89.35,
|
25 |
+
- 'HasAns_total': 5928,
|
26 |
+
- 'NoAns_exact': 93.68,
|
27 |
+
- 'NoAns_f1': 93.68,
|
28 |
+
- 'NoAns_total': 5945,
|
29 |
+
- 'best_exact': 88.70,
|
30 |
+
- 'best_exact_thresh': 0.0,
|
31 |
+
- 'best_f1': 91.52,
|
32 |
+
- 'best_f1_thresh': 0.0}
|
33 |
|
34 |
+
## Model description
|
35 |
+
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
|
|
|
|
|
|
|
36 |
|
37 |
+
## Intended uses
|
38 |
+
Extractive question answering from a given context
|
39 |
|
40 |
### Training hyperparameters
|
41 |
+
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during training:
|
|
|
42 |
- learning_rate: 1e-05
|
43 |
- train_batch_size: 8
|
44 |
- eval_batch_size: 8
|
|
|
50 |
- lr_scheduler_warmup_steps: 1000
|
51 |
- training_steps: 5200
|
52 |
|
53 |
+
### Framework versions
|
54 |
+
- Transformers 4.35.0.dev0
|
55 |
+
- Pytorch 2.1.0+cu121
|
56 |
+
- Datasets 2.14.5
|
57 |
+
- Tokenizers 0.14.0
|
58 |
+
|
59 |
+
### System
|
60 |
+
- CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
|
61 |
+
- Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
|
62 |
+
- Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
|
63 |
+
- GPU: NVIDIA TITAN RTX - 24GB Memory
|
64 |
+
- CUDA runtime version: 12.1.105
|
65 |
+
- Nvidia driver version: 535.113.01
|
66 |
|
67 |
+
### Training results before/after the best model (Step 3620)
|
68 |
| Training Loss | Epoch | Step | Validation Loss |
|
69 |
|:-------------:|:-----:|:----:|:---------------:|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
| 0.5323 | 1.72 | 3500 | 0.5860 |
|
71 |
| 0.5129 | 1.73 | 3520 | 0.5656 |
|
72 |
| 0.5441 | 1.74 | 3540 | 0.5642 |
|
|
|
77 |
| 0.5061 | 1.79 | 3640 | 0.5837 |
|
78 |
| 0.484 | 1.79 | 3660 | 0.5721 |
|
79 |
| 0.5095 | 1.8 | 3680 | 0.5821 |
|
80 |
+
| 0.5342 | 1.81 | 3700 | 0.5602 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|