ahotrod
/

deberta-v3-large-finetuned-squadv2

@@ -9,33 +9,36 @@ model-index:
 - name: deberta-v3-large-finetuned-squadv2
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # deberta-v3-large-finetuned-squadv2
 This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.5579
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
-The following hyperparameters were used during training:
 - learning_rate: 1e-05
 - train_batch_size: 8
 - eval_batch_size: 8
@@ -47,25 +50,23 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 1000
 - training_steps: 5200
-### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.5293        | 1.57  | 3200 | 0.5739          |
-| 0.5106        | 1.58  | 3220 | 0.5783          |
-| 0.5338        | 1.59  | 3240 | 0.5718          |
-| 0.5128        | 1.6   | 3260 | 0.5827          |
-| 0.5205        | 1.61  | 3280 | 0.6045          |
-| 0.5114        | 1.62  | 3300 | 0.5880          |
-| 0.5072        | 1.63  | 3320 | 0.5788          |
-| 0.5512        | 1.64  | 3340 | 0.5863          |
-| 0.4723        | 1.65  | 3360 | 0.5898          |
-| 0.5011        | 1.66  | 3380 | 0.5917          |
-| 0.5419        | 1.67  | 3400 | 0.6027          |
-| 0.5425        | 1.68  | 3420 | 0.5699          |
-| 0.5703        | 1.69  | 3440 | 0.5897          |
-| 0.4646        | 1.7   | 3460 | 0.5917          |
-| 0.4652        | 1.71  | 3480 | 0.5745          |
 | 0.5323        | 1.72  | 3500 | 0.5860          |
 | 0.5129        | 1.73  | 3520 | 0.5656          |
 | 0.5441        | 1.74  | 3540 | 0.5642          |
@@ -76,87 +77,4 @@ The following hyperparameters were used during training:
 | 0.5061        | 1.79  | 3640 | 0.5837          |
 | 0.484         | 1.79  | 3660 | 0.5721          |
 | 0.5095        | 1.8   | 3680 | 0.5821          |
-| 0.5342        | 1.81  | 3700 | 0.5602          |
-| 0.5435        | 1.82  | 3720 | 0.5911          |
-| 0.5288        | 1.83  | 3740 | 0.5647          |
-| 0.5476        | 1.84  | 3760 | 0.5733          |
-| 0.5199        | 1.85  | 3780 | 0.5675          |
-| 0.5067        | 1.86  | 3800 | 0.5839          |
-| 0.5418        | 1.87  | 3820 | 0.5757          |
-| 0.4965        | 1.88  | 3840 | 0.5764          |
-| 0.5273        | 1.89  | 3860 | 0.5906          |
-| 0.5808        | 1.9   | 3880 | 0.5762          |
-| 0.5161        | 1.91  | 3900 | 0.5612          |
-| 0.4863        | 1.92  | 3920 | 0.5804          |
-| 0.4827        | 1.93  | 3940 | 0.5841          |
-| 0.4643        | 1.94  | 3960 | 0.5822          |
-| 0.5029        | 1.95  | 3980 | 0.6052          |
-| 0.509         | 1.96  | 4000 | 0.5800          |
-| 0.5382        | 1.97  | 4020 | 0.5645          |
-| 0.469         | 1.98  | 4040 | 0.5685          |
-| 0.5032        | 1.99  | 4060 | 0.5779          |
-| 0.5171        | 2.0   | 4080 | 0.5686          |
-| 0.3938        | 2.01  | 4100 | 0.5889          |
-| 0.4321        | 2.02  | 4120 | 0.6039          |
-| 0.4185        | 2.03  | 4140 | 0.5996          |
-| 0.4782        | 2.04  | 4160 | 0.5800          |
-| 0.424         | 2.05  | 4180 | 0.6374          |
-| 0.3766        | 2.06  | 4200 | 0.6096          |
-| 0.415         | 2.07  | 4220 | 0.6221          |
-| 0.4352        | 2.08  | 4240 | 0.6150          |
-| 0.4336        | 2.09  | 4260 | 0.6055          |
-| 0.4289        | 2.1   | 4280 | 0.6138          |
-| 0.4433        | 2.11  | 4300 | 0.5946          |
-| 0.4478        | 2.12  | 4320 | 0.6118          |
-| 0.4787        | 2.13  | 4340 | 0.5969          |
-| 0.4432        | 2.14  | 4360 | 0.6048          |
-| 0.4319        | 2.15  | 4380 | 0.5948          |
-| 0.3939        | 2.16  | 4400 | 0.6116          |
-| 0.3921        | 2.17  | 4420 | 0.6082          |
-| 0.4381        | 2.18  | 4440 | 0.6282          |
-| 0.4461        | 2.19  | 4460 | 0.6084          |
-| 0.4012        | 2.2   | 4480 | 0.6092          |
-| 0.3849        | 2.21  | 4500 | 0.6152          |
-| 0.4178        | 2.22  | 4520 | 0.6004          |
-| 0.4163        | 2.23  | 4540 | 0.6059          |
-| 0.4006        | 2.24  | 4560 | 0.6115          |
-| 0.4225        | 2.25  | 4580 | 0.6130          |
-| 0.4008        | 2.26  | 4600 | 0.6095          |
-| 0.4706        | 2.27  | 4620 | 0.6136          |
-| 0.3902        | 2.28  | 4640 | 0.6103          |
-| 0.4048        | 2.29  | 4660 | 0.6085          |
-| 0.4411        | 2.3   | 4680 | 0.6139          |
-| 0.403         | 2.31  | 4700 | 0.6047          |
-| 0.4799        | 2.31  | 4720 | 0.6043          |
-| 0.4316        | 2.32  | 4740 | 0.5960          |
-| 0.4198        | 2.33  | 4760 | 0.6031          |
-| 0.4254        | 2.34  | 4780 | 0.6033          |
-| 0.387         | 2.35  | 4800 | 0.6120          |
-| 0.3882        | 2.36  | 4820 | 0.6128          |
-| 0.4307        | 2.37  | 4840 | 0.6150          |
-| 0.434         | 2.38  | 4860 | 0.6077          |
-| 0.4225        | 2.39  | 4880 | 0.6071          |
-| 0.4134        | 2.4   | 4900 | 0.6036          |
-| 0.3846        | 2.41  | 4920 | 0.6124          |
-| 0.3943        | 2.42  | 4940 | 0.6291          |
-| 0.4455        | 2.43  | 4960 | 0.6185          |
-| 0.4104        | 2.44  | 4980 | 0.6064          |
-| 0.4158        | 2.45  | 5000 | 0.6095          |
-| 0.4135        | 2.46  | 5020 | 0.6155          |
-| 0.3789        | 2.47  | 5040 | 0.6209          |
-| 0.418         | 2.48  | 5060 | 0.6106          |
-| 0.3931        | 2.49  | 5080 | 0.6047          |
-| 0.4289        | 2.5   | 5100 | 0.6055          |
-| 0.4051        | 2.51  | 5120 | 0.6084          |
-| 0.4217        | 2.52  | 5140 | 0.6118          |
-| 0.3843        | 2.53  | 5160 | 0.6139          |
-| 0.4435        | 2.54  | 5180 | 0.6126          |
-| 0.4274        | 2.55  | 5200 | 0.6120          |
-### Framework versions
-- Transformers 4.35.0.dev0
-- Pytorch 2.1.0+cu121
-- Datasets 2.14.5
-- Tokenizers 0.14.0

 - name: deberta-v3-large-finetuned-squadv2
   results: []
 ---
 # deberta-v3-large-finetuned-squadv2
 This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset.
+##  Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
+- 'EM': 89.0
+- 'F1': 91.5
+## Results from this fine-tuning:
+- 'exact': 88.70,
+- 'f1': 91.52,
+- 'total': 11873,
+- 'HasAns_exact': 83.70,
+- 'HasAns_f1': 89.35,
+- 'HasAns_total': 5928,
+- 'NoAns_exact': 93.68,
+- 'NoAns_f1': 93.68,
+- 'NoAns_total': 5945,
+- 'best_exact': 88.70,
+- 'best_exact_thresh': 0.0,
+- 'best_f1': 91.52,
+- 'best_f1_thresh': 0.0}
+## Model description
+For the authors' models, code & detailed information see:  https://github.com/microsoft/DeBERTa
+## Intended uses
+Extractive question answering from a given context
 ### Training hyperparameters
+The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during training:
 - learning_rate: 1e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - lr_scheduler_warmup_steps: 1000
 - training_steps: 5200
+### Framework versions
+- Transformers 4.35.0.dev0
+- Pytorch 2.1.0+cu121
+- Datasets 2.14.5
+- Tokenizers 0.14.0
+### System
+- CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
+- Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
+- Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
+- GPU: NVIDIA TITAN RTX - 24GB Memory
+- CUDA runtime version: 12.1.105
+- Nvidia driver version: 535.113.01
+### Training results before/after the best model (Step 3620)
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
 | 0.5323        | 1.72  | 3500 | 0.5860          |
 | 0.5129        | 1.73  | 3520 | 0.5656          |
 | 0.5441        | 1.74  | 3540 | 0.5642          |
 | 0.5061        | 1.79  | 3640 | 0.5837          |
 | 0.484         | 1.79  | 3660 | 0.5721          |
 | 0.5095        | 1.8   | 3680 | 0.5821          |
+| 0.5342        | 1.81  | 3700 | 0.5602          |