bert-large-uncased-swag

This model is a fine-tuned version of google-bert/bert-large-uncased on SWAG dataset. It achieves the following results on the evaluation set:

Model description

This model should be used as an expert in the Meteor-of-LoRA framework.

The data were splitted based on HuggingFace default dataset:

dataset = load_dataset("swag")

Our approach focuses explicitly on adapting the Transformers weights' Wq (query) and Wv (value) in the attention module for parameter efficiency.

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.2132	0.1088	500	0.8717	0.6959
0.908	0.2175	1000	0.7149	0.7473
0.8353	0.3263	1500	0.6474	0.7575
0.8075	0.4351	2000	0.6142	0.7798
0.8011	0.5438	2500	0.5785	0.7867
0.7727	0.6526	3000	0.5643	0.7936
0.7647	0.7614	3500	0.5698	0.7956
0.7731	0.8701	4000	0.5453	0.8011
0.7489	0.9789	4500	0.5336	0.8052
0.7496	1.0877	5000	0.5431	0.8033
0.735	1.1964	5500	0.5231	0.8083
0.7194	1.3052	6000	0.5147	0.8096
0.7307	1.4140	6500	0.5102	0.8112
0.7355	1.5227	7000	0.5223	0.8133
0.7085	1.6315	7500	0.5054	0.8142
0.7206	1.7403	8000	0.5026	0.8157
0.7143	1.8490	8500	0.5126	0.8144
0.7045	1.9578	9000	0.5035	0.8162
0.6972	2.0666	9500	0.4948	0.8178
0.6885	2.1753	10000	0.4890	0.8202
0.7079	2.2841	10500	0.4910	0.8193
0.6874	2.3929	11000	0.4907	0.8222
0.6832	2.5016	11500	0.4875	0.8217
0.6807	2.6104	12000	0.4824	0.8224
0.6865	2.7192	12500	0.4877	0.8227
0.6863	2.8279	13000	0.4821	0.8232
0.6913	2.9367	13500	0.4914	0.8229
0.6996	3.0455	14000	0.4843	0.8241
0.687	3.1542	14500	0.4753	0.8250
0.6896	3.2630	15000	0.4762	0.8251
0.6745	3.3718	15500	0.4753	0.8242
0.6735	3.4805	16000	0.4713	0.8267
0.6764	3.5893	16500	0.4715	0.8259
0.6521	3.6981	17000	0.4669	0.8285
0.6686	3.8068	17500	0.4726	0.8269
0.6721	3.9156	18000	0.4703	0.8273
0.6682	4.0244	18500	0.4660	0.8274
0.6533	4.1331	19000	0.4690	0.8281
0.6547	4.2419	19500	0.4697	0.8282
0.6589	4.3507	20000	0.4640	0.8291
0.6518	4.4594	20500	0.4638	0.8294
0.6739	4.5682	21000	0.4669	0.8285
0.6763	4.6770	21500	0.4628	0.8304
0.6503	4.7857	22000	0.4640	0.8296
0.6659	4.8945	22500	0.4643	0.8295