Configuration Parsing
Warning:
In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
flan-t5la-small
This model is a fine-tuned version of hrezaei/flan-t5la-small on the generator dataset. It achieves the following results on the evaluation set:
- Perplexity: 1049.0721
- Loss: 6.9557
- Accuracy: 0.0032
- Lookahead Perplexity: 901264.5464
- Lookahead Loss: 13.7116
- Base Perplexity: 1.2211
- Base Loss: 0.1998
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 1000
Training results
| Training Loss | Epoch | Step | Perplexity | Validation Loss | Accuracy | Lookahead Perplexity | Lookahead Loss | Base Perplexity | Base Loss |
|---|---|---|---|---|---|---|---|---|---|
| No log | 0.02 | 20 | 1213.8055 | 7.1015 | 0.0032 | 1206535.6385 | 14.0033 | 1.2211 | 0.1998 |
| No log | 0.04 | 40 | 1206.3155 | 7.0953 | 0.0032 | 1191692.1735 | 13.9909 | 1.2211 | 0.1998 |
| No log | 0.06 | 60 | 1199.0232 | 7.0893 | 0.0032 | 1177326.6665 | 13.9788 | 1.2211 | 0.1998 |
| No log | 0.08 | 80 | 1191.7477 | 7.0832 | 0.0032 | 1163083.4402 | 13.9666 | 1.2211 | 0.1998 |
| No log | 0.1 | 100 | 1185.2768 | 7.0777 | 0.0032 | 1150487.3725 | 13.9557 | 1.2211 | 0.1998 |
| No log | 0.12 | 120 | 1178.7315 | 7.0722 | 0.0032 | 1137815.2529 | 13.9446 | 1.2211 | 0.1998 |
| No log | 0.14 | 140 | 1172.3743 | 7.0668 | 0.0032 | 1125574.4234 | 13.9338 | 1.2211 | 0.1998 |
| No log | 0.16 | 160 | 1166.1265 | 7.0614 | 0.0032 | 1113610.3033 | 13.9231 | 1.2211 | 0.1998 |
| No log | 0.18 | 180 | 1160.1897 | 7.0563 | 0.0032 | 1102300.5613 | 13.9129 | 1.2211 | 0.1998 |
| No log | 0.2 | 200 | 1154.5578 | 7.0515 | 0.0032 | 1091624.4349 | 13.9032 | 1.2211 | 0.1998 |
| No log | 0.22 | 220 | 1149.0584 | 7.0467 | 0.0032 | 1081250.7225 | 13.8936 | 1.2211 | 0.1998 |
| No log | 0.24 | 240 | 1143.6147 | 7.0419 | 0.0032 | 1071029.2184 | 13.8841 | 1.2211 | 0.1998 |
| No log | 0.26 | 260 | 1138.3861 | 7.0374 | 0.0032 | 1061257.7230 | 13.8750 | 1.2211 | 0.1998 |
| No log | 0.28 | 280 | 1133.2599 | 7.0329 | 0.0032 | 1051720.2808 | 13.8659 | 1.2211 | 0.1998 |
| No log | 0.3 | 300 | 1128.3106 | 7.0285 | 0.0032 | 1042554.2214 | 13.8572 | 1.2211 | 0.1998 |
| No log | 0.32 | 320 | 1123.6261 | 7.0243 | 0.0032 | 1033917.4563 | 13.8489 | 1.2211 | 0.1998 |
| No log | 0.34 | 340 | 1119.1798 | 7.0204 | 0.0032 | 1025750.7807 | 13.8409 | 1.2211 | 0.1998 |
| No log | 0.36 | 360 | 1114.9575 | 7.0166 | 0.0032 | 1018024.1579 | 13.8334 | 1.2211 | 0.1998 |
| No log | 0.38 | 380 | 1110.8108 | 7.0128 | 0.0032 | 1010467.9853 | 13.8259 | 1.2211 | 0.1998 |
| No log | 0.4 | 400 | 1106.7029 | 7.0091 | 0.0032 | 1003008.0716 | 13.8185 | 1.2211 | 0.1998 |
| No log | 0.42 | 420 | 1102.8909 | 7.0057 | 0.0032 | 996109.4490 | 13.8116 | 1.2211 | 0.1998 |
| No log | 0.44 | 440 | 1099.1298 | 7.0023 | 0.0032 | 989326.7098 | 13.8048 | 1.2211 | 0.1998 |
| No log | 0.46 | 460 | 1095.4724 | 6.9989 | 0.0032 | 982753.7669 | 13.7981 | 1.2211 | 0.1998 |
| No log | 0.48 | 480 | 1091.9375 | 6.9957 | 0.0032 | 976422.6978 | 13.7917 | 1.2211 | 0.1998 |
| 5.8652 | 0.5 | 500 | 1088.7084 | 6.9927 | 0.0032 | 970655.0139 | 13.7857 | 1.2211 | 0.1998 |
| 5.8652 | 0.52 | 520 | 1085.3604 | 6.9897 | 0.0032 | 964694.6469 | 13.7796 | 1.2211 | 0.1998 |
| 5.8652 | 0.54 | 540 | 1082.1193 | 6.9867 | 0.0032 | 958942.3112 | 13.7736 | 1.2211 | 0.1998 |
| 5.8652 | 0.56 | 560 | 1079.2130 | 6.9840 | 0.0032 | 953797.4794 | 13.7682 | 1.2211 | 0.1998 |
| 5.8652 | 0.58 | 580 | 1076.4741 | 6.9814 | 0.0032 | 948962.6627 | 13.7631 | 1.2211 | 0.1998 |
| 5.8652 | 0.6 | 600 | 1073.7903 | 6.9790 | 0.0032 | 944236.1752 | 13.7581 | 1.2211 | 0.1998 |
| 5.8652 | 0.62 | 620 | 1071.2598 | 6.9766 | 0.0032 | 939792.4333 | 13.7534 | 1.2211 | 0.1998 |
| 5.8652 | 1.015 | 640 | 1068.8877 | 6.9744 | 0.0032 | 935634.4488 | 13.7490 | 1.2211 | 0.1998 |
| 5.8652 | 1.035 | 660 | 1066.6616 | 6.9723 | 0.0032 | 931740.4382 | 13.7448 | 1.2211 | 0.1998 |
| 5.8652 | 1.055 | 680 | 1064.6844 | 6.9704 | 0.0032 | 928289.6189 | 13.7411 | 1.2211 | 0.1998 |
| 5.8652 | 1.075 | 700 | 1062.6794 | 6.9685 | 0.0032 | 924796.9256 | 13.7373 | 1.2211 | 0.1998 |
| 5.8652 | 1.095 | 720 | 1060.9499 | 6.9669 | 0.0032 | 921789.0832 | 13.7341 | 1.2211 | 0.1998 |
| 5.8652 | 1.115 | 740 | 1059.3267 | 6.9654 | 0.0032 | 918969.7914 | 13.7310 | 1.2211 | 0.1998 |
| 5.8652 | 1.135 | 760 | 1057.8018 | 6.9639 | 0.0032 | 916326.5630 | 13.7281 | 1.2211 | 0.1998 |
| 5.8652 | 1.155 | 780 | 1056.3939 | 6.9626 | 0.0032 | 913888.4304 | 13.7255 | 1.2211 | 0.1998 |
| 5.8652 | 1.175 | 800 | 1055.1248 | 6.9614 | 0.0032 | 911694.9862 | 13.7231 | 1.2211 | 0.1998 |
| 5.8652 | 1.195 | 820 | 1053.9909 | 6.9603 | 0.0032 | 909736.4048 | 13.7209 | 1.2211 | 0.1998 |
| 5.8652 | 1.215 | 840 | 1052.9837 | 6.9594 | 0.0032 | 907998.2673 | 13.7190 | 1.2211 | 0.1998 |
| 5.8652 | 1.2350 | 860 | 1052.0622 | 6.9585 | 0.0032 | 906409.2837 | 13.7172 | 1.2211 | 0.1998 |
| 5.8652 | 1.255 | 880 | 1051.2709 | 6.9578 | 0.0032 | 905046.1663 | 13.7157 | 1.2211 | 0.1998 |
| 5.8652 | 1.275 | 900 | 1050.5934 | 6.9571 | 0.0032 | 903880.6568 | 13.7145 | 1.2211 | 0.1998 |
| 5.8652 | 1.295 | 920 | 1050.0365 | 6.9566 | 0.0032 | 902923.3562 | 13.7134 | 1.2211 | 0.1998 |
| 5.8652 | 1.315 | 940 | 1049.6145 | 6.9562 | 0.0032 | 902196.9879 | 13.7126 | 1.2211 | 0.1998 |
| 5.8652 | 1.335 | 960 | 1049.3147 | 6.9559 | 0.0032 | 901682.1050 | 13.7120 | 1.2211 | 0.1998 |
| 5.8652 | 1.355 | 980 | 1049.1336 | 6.9557 | 0.0032 | 901369.2686 | 13.7117 | 1.2211 | 0.1998 |
| 5.8083 | 1.375 | 1000 | 1049.0721 | 6.9557 | 0.0032 | 901264.5464 | 13.7116 | 1.2211 | 0.1998 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 38
Model tree for hrezaei/flan-t5la-small
Unable to build the model tree, the base model loops to the model itself. Learn more.