rotating-head-gp-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1692
  • Accuracy: 0.4228
  • Perplexity: 23.7877
  • Bleu: 0.1332

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 7

Training results

Training Loss Epoch Step Accuracy Bleu Validation Loss Perplexity
5.9062 0.2806 500 0.2234 0.0493 5.7470 313.2463
4.8598 0.5612 1000 0.2811 0.0698 4.7428 114.7554
4.3025 0.8418 1500 0.3170 0.0834 4.2329 68.9191
3.9635 1.1223 2000 0.3454 0.0932 3.9291 50.8590
3.7769 1.4029 2500 0.3636 0.1020 3.7427 42.2098
3.6738 1.6835 3000 0.3754 0.1066 3.6225 37.4295
3.5744 1.9641 3500 0.3845 0.1118 3.5325 34.2102
3.456 2.2447 4000 0.3902 0.1139 3.4704 32.1497
3.3972 2.5253 4500 0.3955 0.1230 3.4190 30.5384
3.3654 2.8058 5000 0.4007 0.1230 3.3686 29.0392
3.247 3.0864 5500 0.4043 0.1247 3.3328 28.0168
3.2403 3.3670 6000 0.4083 0.1298 3.2985 27.0714
3.2167 3.6476 6500 0.4112 0.1288 3.2693 26.2922
3.1903 3.9282 7000 0.4134 0.1305 3.2456 25.6768
3.1212 4.2088 7500 0.4161 0.1325 3.2262 25.1831
3.0816 4.4893 8000 0.4176 0.1307 3.2128 24.8480
3.0917 4.7699 8500 0.4196 0.1339 3.1985 24.4954
3.0562 5.0505 9000 0.4185 0.1326 3.2049 24.6521
3.0683 5.3311 9500 0.4195 0.1307 3.1970 24.4597
3.0502 5.6117 10000 0.4209 0.1331 3.1857 24.1847
3.0469 5.8923 10500 0.4217 0.1309 3.1790 24.0231
3.0245 6.1728 11000 3.1863 0.4205 24.1979 0.1294
3.0203 6.4534 11500 3.1783 0.4218 24.0068 0.1331
3.0265 6.7340 12000 3.1692 0.4228 23.7877 0.1332

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
355M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support