Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

flan-t5la-small

This model is a fine-tuned version of hrezaei/flan-t5la-small on the generator dataset. It achieves the following results on the evaluation set:

  • Perplexity: 1049.0721
  • Loss: 6.9557
  • Accuracy: 0.0032
  • Lookahead Perplexity: 901264.5464
  • Lookahead Loss: 13.7116
  • Base Perplexity: 1.2211
  • Base Loss: 0.1998

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 1000

Training results

Training Loss Epoch Step Perplexity Validation Loss Accuracy Lookahead Perplexity Lookahead Loss Base Perplexity Base Loss
No log 0.02 20 1213.8055 7.1015 0.0032 1206535.6385 14.0033 1.2211 0.1998
No log 0.04 40 1206.3155 7.0953 0.0032 1191692.1735 13.9909 1.2211 0.1998
No log 0.06 60 1199.0232 7.0893 0.0032 1177326.6665 13.9788 1.2211 0.1998
No log 0.08 80 1191.7477 7.0832 0.0032 1163083.4402 13.9666 1.2211 0.1998
No log 0.1 100 1185.2768 7.0777 0.0032 1150487.3725 13.9557 1.2211 0.1998
No log 0.12 120 1178.7315 7.0722 0.0032 1137815.2529 13.9446 1.2211 0.1998
No log 0.14 140 1172.3743 7.0668 0.0032 1125574.4234 13.9338 1.2211 0.1998
No log 0.16 160 1166.1265 7.0614 0.0032 1113610.3033 13.9231 1.2211 0.1998
No log 0.18 180 1160.1897 7.0563 0.0032 1102300.5613 13.9129 1.2211 0.1998
No log 0.2 200 1154.5578 7.0515 0.0032 1091624.4349 13.9032 1.2211 0.1998
No log 0.22 220 1149.0584 7.0467 0.0032 1081250.7225 13.8936 1.2211 0.1998
No log 0.24 240 1143.6147 7.0419 0.0032 1071029.2184 13.8841 1.2211 0.1998
No log 0.26 260 1138.3861 7.0374 0.0032 1061257.7230 13.8750 1.2211 0.1998
No log 0.28 280 1133.2599 7.0329 0.0032 1051720.2808 13.8659 1.2211 0.1998
No log 0.3 300 1128.3106 7.0285 0.0032 1042554.2214 13.8572 1.2211 0.1998
No log 0.32 320 1123.6261 7.0243 0.0032 1033917.4563 13.8489 1.2211 0.1998
No log 0.34 340 1119.1798 7.0204 0.0032 1025750.7807 13.8409 1.2211 0.1998
No log 0.36 360 1114.9575 7.0166 0.0032 1018024.1579 13.8334 1.2211 0.1998
No log 0.38 380 1110.8108 7.0128 0.0032 1010467.9853 13.8259 1.2211 0.1998
No log 0.4 400 1106.7029 7.0091 0.0032 1003008.0716 13.8185 1.2211 0.1998
No log 0.42 420 1102.8909 7.0057 0.0032 996109.4490 13.8116 1.2211 0.1998
No log 0.44 440 1099.1298 7.0023 0.0032 989326.7098 13.8048 1.2211 0.1998
No log 0.46 460 1095.4724 6.9989 0.0032 982753.7669 13.7981 1.2211 0.1998
No log 0.48 480 1091.9375 6.9957 0.0032 976422.6978 13.7917 1.2211 0.1998
5.8652 0.5 500 1088.7084 6.9927 0.0032 970655.0139 13.7857 1.2211 0.1998
5.8652 0.52 520 1085.3604 6.9897 0.0032 964694.6469 13.7796 1.2211 0.1998
5.8652 0.54 540 1082.1193 6.9867 0.0032 958942.3112 13.7736 1.2211 0.1998
5.8652 0.56 560 1079.2130 6.9840 0.0032 953797.4794 13.7682 1.2211 0.1998
5.8652 0.58 580 1076.4741 6.9814 0.0032 948962.6627 13.7631 1.2211 0.1998
5.8652 0.6 600 1073.7903 6.9790 0.0032 944236.1752 13.7581 1.2211 0.1998
5.8652 0.62 620 1071.2598 6.9766 0.0032 939792.4333 13.7534 1.2211 0.1998
5.8652 1.015 640 1068.8877 6.9744 0.0032 935634.4488 13.7490 1.2211 0.1998
5.8652 1.035 660 1066.6616 6.9723 0.0032 931740.4382 13.7448 1.2211 0.1998
5.8652 1.055 680 1064.6844 6.9704 0.0032 928289.6189 13.7411 1.2211 0.1998
5.8652 1.075 700 1062.6794 6.9685 0.0032 924796.9256 13.7373 1.2211 0.1998
5.8652 1.095 720 1060.9499 6.9669 0.0032 921789.0832 13.7341 1.2211 0.1998
5.8652 1.115 740 1059.3267 6.9654 0.0032 918969.7914 13.7310 1.2211 0.1998
5.8652 1.135 760 1057.8018 6.9639 0.0032 916326.5630 13.7281 1.2211 0.1998
5.8652 1.155 780 1056.3939 6.9626 0.0032 913888.4304 13.7255 1.2211 0.1998
5.8652 1.175 800 1055.1248 6.9614 0.0032 911694.9862 13.7231 1.2211 0.1998
5.8652 1.195 820 1053.9909 6.9603 0.0032 909736.4048 13.7209 1.2211 0.1998
5.8652 1.215 840 1052.9837 6.9594 0.0032 907998.2673 13.7190 1.2211 0.1998
5.8652 1.2350 860 1052.0622 6.9585 0.0032 906409.2837 13.7172 1.2211 0.1998
5.8652 1.255 880 1051.2709 6.9578 0.0032 905046.1663 13.7157 1.2211 0.1998
5.8652 1.275 900 1050.5934 6.9571 0.0032 903880.6568 13.7145 1.2211 0.1998
5.8652 1.295 920 1050.0365 6.9566 0.0032 902923.3562 13.7134 1.2211 0.1998
5.8652 1.315 940 1049.6145 6.9562 0.0032 902196.9879 13.7126 1.2211 0.1998
5.8652 1.335 960 1049.3147 6.9559 0.0032 901682.1050 13.7120 1.2211 0.1998
5.8652 1.355 980 1049.1336 6.9557 0.0032 901369.2686 13.7117 1.2211 0.1998
5.8083 1.375 1000 1049.0721 6.9557 0.0032 901264.5464 13.7116 1.2211 0.1998

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
38
Safetensors
Model size
93.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hrezaei/flan-t5la-small

Unable to build the model tree, the base model loops to the model itself. Learn more.

Evaluation results