ahotrod commited on
Commit
4d3a18f
1 Parent(s): 88e894e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -120
README.md CHANGED
@@ -9,33 +9,36 @@ model-index:
9
  - name: deberta-v3-large-finetuned-squadv2
10
  results: []
11
  ---
12
-
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
  # deberta-v3-large-finetuned-squadv2
17
-
18
  This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.5579
21
-
22
- ## Model description
23
-
24
- More information needed
25
 
26
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- More information needed
29
-
30
- ## Training and evaluation data
31
-
32
- More information needed
33
 
34
- ## Training procedure
 
35
 
36
  ### Training hyperparameters
37
-
38
- The following hyperparameters were used during training:
39
  - learning_rate: 1e-05
40
  - train_batch_size: 8
41
  - eval_batch_size: 8
@@ -47,25 +50,23 @@ The following hyperparameters were used during training:
47
  - lr_scheduler_warmup_steps: 1000
48
  - training_steps: 5200
49
 
50
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
51
 
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | 0.5293 | 1.57 | 3200 | 0.5739 |
55
- | 0.5106 | 1.58 | 3220 | 0.5783 |
56
- | 0.5338 | 1.59 | 3240 | 0.5718 |
57
- | 0.5128 | 1.6 | 3260 | 0.5827 |
58
- | 0.5205 | 1.61 | 3280 | 0.6045 |
59
- | 0.5114 | 1.62 | 3300 | 0.5880 |
60
- | 0.5072 | 1.63 | 3320 | 0.5788 |
61
- | 0.5512 | 1.64 | 3340 | 0.5863 |
62
- | 0.4723 | 1.65 | 3360 | 0.5898 |
63
- | 0.5011 | 1.66 | 3380 | 0.5917 |
64
- | 0.5419 | 1.67 | 3400 | 0.6027 |
65
- | 0.5425 | 1.68 | 3420 | 0.5699 |
66
- | 0.5703 | 1.69 | 3440 | 0.5897 |
67
- | 0.4646 | 1.7 | 3460 | 0.5917 |
68
- | 0.4652 | 1.71 | 3480 | 0.5745 |
69
  | 0.5323 | 1.72 | 3500 | 0.5860 |
70
  | 0.5129 | 1.73 | 3520 | 0.5656 |
71
  | 0.5441 | 1.74 | 3540 | 0.5642 |
@@ -76,87 +77,4 @@ The following hyperparameters were used during training:
76
  | 0.5061 | 1.79 | 3640 | 0.5837 |
77
  | 0.484 | 1.79 | 3660 | 0.5721 |
78
  | 0.5095 | 1.8 | 3680 | 0.5821 |
79
- | 0.5342 | 1.81 | 3700 | 0.5602 |
80
- | 0.5435 | 1.82 | 3720 | 0.5911 |
81
- | 0.5288 | 1.83 | 3740 | 0.5647 |
82
- | 0.5476 | 1.84 | 3760 | 0.5733 |
83
- | 0.5199 | 1.85 | 3780 | 0.5675 |
84
- | 0.5067 | 1.86 | 3800 | 0.5839 |
85
- | 0.5418 | 1.87 | 3820 | 0.5757 |
86
- | 0.4965 | 1.88 | 3840 | 0.5764 |
87
- | 0.5273 | 1.89 | 3860 | 0.5906 |
88
- | 0.5808 | 1.9 | 3880 | 0.5762 |
89
- | 0.5161 | 1.91 | 3900 | 0.5612 |
90
- | 0.4863 | 1.92 | 3920 | 0.5804 |
91
- | 0.4827 | 1.93 | 3940 | 0.5841 |
92
- | 0.4643 | 1.94 | 3960 | 0.5822 |
93
- | 0.5029 | 1.95 | 3980 | 0.6052 |
94
- | 0.509 | 1.96 | 4000 | 0.5800 |
95
- | 0.5382 | 1.97 | 4020 | 0.5645 |
96
- | 0.469 | 1.98 | 4040 | 0.5685 |
97
- | 0.5032 | 1.99 | 4060 | 0.5779 |
98
- | 0.5171 | 2.0 | 4080 | 0.5686 |
99
- | 0.3938 | 2.01 | 4100 | 0.5889 |
100
- | 0.4321 | 2.02 | 4120 | 0.6039 |
101
- | 0.4185 | 2.03 | 4140 | 0.5996 |
102
- | 0.4782 | 2.04 | 4160 | 0.5800 |
103
- | 0.424 | 2.05 | 4180 | 0.6374 |
104
- | 0.3766 | 2.06 | 4200 | 0.6096 |
105
- | 0.415 | 2.07 | 4220 | 0.6221 |
106
- | 0.4352 | 2.08 | 4240 | 0.6150 |
107
- | 0.4336 | 2.09 | 4260 | 0.6055 |
108
- | 0.4289 | 2.1 | 4280 | 0.6138 |
109
- | 0.4433 | 2.11 | 4300 | 0.5946 |
110
- | 0.4478 | 2.12 | 4320 | 0.6118 |
111
- | 0.4787 | 2.13 | 4340 | 0.5969 |
112
- | 0.4432 | 2.14 | 4360 | 0.6048 |
113
- | 0.4319 | 2.15 | 4380 | 0.5948 |
114
- | 0.3939 | 2.16 | 4400 | 0.6116 |
115
- | 0.3921 | 2.17 | 4420 | 0.6082 |
116
- | 0.4381 | 2.18 | 4440 | 0.6282 |
117
- | 0.4461 | 2.19 | 4460 | 0.6084 |
118
- | 0.4012 | 2.2 | 4480 | 0.6092 |
119
- | 0.3849 | 2.21 | 4500 | 0.6152 |
120
- | 0.4178 | 2.22 | 4520 | 0.6004 |
121
- | 0.4163 | 2.23 | 4540 | 0.6059 |
122
- | 0.4006 | 2.24 | 4560 | 0.6115 |
123
- | 0.4225 | 2.25 | 4580 | 0.6130 |
124
- | 0.4008 | 2.26 | 4600 | 0.6095 |
125
- | 0.4706 | 2.27 | 4620 | 0.6136 |
126
- | 0.3902 | 2.28 | 4640 | 0.6103 |
127
- | 0.4048 | 2.29 | 4660 | 0.6085 |
128
- | 0.4411 | 2.3 | 4680 | 0.6139 |
129
- | 0.403 | 2.31 | 4700 | 0.6047 |
130
- | 0.4799 | 2.31 | 4720 | 0.6043 |
131
- | 0.4316 | 2.32 | 4740 | 0.5960 |
132
- | 0.4198 | 2.33 | 4760 | 0.6031 |
133
- | 0.4254 | 2.34 | 4780 | 0.6033 |
134
- | 0.387 | 2.35 | 4800 | 0.6120 |
135
- | 0.3882 | 2.36 | 4820 | 0.6128 |
136
- | 0.4307 | 2.37 | 4840 | 0.6150 |
137
- | 0.434 | 2.38 | 4860 | 0.6077 |
138
- | 0.4225 | 2.39 | 4880 | 0.6071 |
139
- | 0.4134 | 2.4 | 4900 | 0.6036 |
140
- | 0.3846 | 2.41 | 4920 | 0.6124 |
141
- | 0.3943 | 2.42 | 4940 | 0.6291 |
142
- | 0.4455 | 2.43 | 4960 | 0.6185 |
143
- | 0.4104 | 2.44 | 4980 | 0.6064 |
144
- | 0.4158 | 2.45 | 5000 | 0.6095 |
145
- | 0.4135 | 2.46 | 5020 | 0.6155 |
146
- | 0.3789 | 2.47 | 5040 | 0.6209 |
147
- | 0.418 | 2.48 | 5060 | 0.6106 |
148
- | 0.3931 | 2.49 | 5080 | 0.6047 |
149
- | 0.4289 | 2.5 | 5100 | 0.6055 |
150
- | 0.4051 | 2.51 | 5120 | 0.6084 |
151
- | 0.4217 | 2.52 | 5140 | 0.6118 |
152
- | 0.3843 | 2.53 | 5160 | 0.6139 |
153
- | 0.4435 | 2.54 | 5180 | 0.6126 |
154
- | 0.4274 | 2.55 | 5200 | 0.6120 |
155
-
156
-
157
- ### Framework versions
158
-
159
- - Transformers 4.35.0.dev0
160
- - Pytorch 2.1.0+cu121
161
- - Datasets 2.14.5
162
- - Tokenizers 0.14.0
 
9
  - name: deberta-v3-large-finetuned-squadv2
10
  results: []
11
  ---
 
 
 
 
12
  # deberta-v3-large-finetuned-squadv2
 
13
  This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
 
 
 
 
 
 
14
 
15
+ ## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
16
+ - 'EM': 89.0
17
+ - 'F1': 91.5
18
+
19
+ ## Results from this fine-tuning:
20
+ - 'exact': 88.70,
21
+ - 'f1': 91.52,
22
+ - 'total': 11873,
23
+ - 'HasAns_exact': 83.70,
24
+ - 'HasAns_f1': 89.35,
25
+ - 'HasAns_total': 5928,
26
+ - 'NoAns_exact': 93.68,
27
+ - 'NoAns_f1': 93.68,
28
+ - 'NoAns_total': 5945,
29
+ - 'best_exact': 88.70,
30
+ - 'best_exact_thresh': 0.0,
31
+ - 'best_f1': 91.52,
32
+ - 'best_f1_thresh': 0.0}
33
 
34
+ ## Model description
35
+ For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
 
 
 
36
 
37
+ ## Intended uses
38
+ Extractive question answering from a given context
39
 
40
  ### Training hyperparameters
41
+ The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during training:
 
42
  - learning_rate: 1e-05
43
  - train_batch_size: 8
44
  - eval_batch_size: 8
 
50
  - lr_scheduler_warmup_steps: 1000
51
  - training_steps: 5200
52
 
53
+ ### Framework versions
54
+ - Transformers 4.35.0.dev0
55
+ - Pytorch 2.1.0+cu121
56
+ - Datasets 2.14.5
57
+ - Tokenizers 0.14.0
58
+
59
+ ### System
60
+ - CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
61
+ - Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
62
+ - Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
63
+ - GPU: NVIDIA TITAN RTX - 24GB Memory
64
+ - CUDA runtime version: 12.1.105
65
+ - Nvidia driver version: 535.113.01
66
 
67
+ ### Training results before/after the best model (Step 3620)
68
  | Training Loss | Epoch | Step | Validation Loss |
69
  |:-------------:|:-----:|:----:|:---------------:|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  | 0.5323 | 1.72 | 3500 | 0.5860 |
71
  | 0.5129 | 1.73 | 3520 | 0.5656 |
72
  | 0.5441 | 1.74 | 3540 | 0.5642 |
 
77
  | 0.5061 | 1.79 | 3640 | 0.5837 |
78
  | 0.484 | 1.79 | 3660 | 0.5721 |
79
  | 0.5095 | 1.8 | 3680 | 0.5821 |
80
+ | 0.5342 | 1.81 | 3700 | 0.5602 |