tfa_output_2025_m05_d13_t13h_51m_30s
This model is a fine-tuned version of internlm/internlm2-math-plus-1_8b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.2781
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0 | 0 | 1.2781 |
2.6109 | 0.0049 | 25 | 1.2778 |
2.6952 | 0.0098 | 50 | 1.2779 |
2.8036 | 0.0147 | 75 | 1.2780 |
2.8808 | 0.0196 | 100 | 1.2780 |
3.0941 | 0.0245 | 125 | 1.2781 |
2.4797 | 0.0294 | 150 | 1.2779 |
2.7034 | 0.0343 | 175 | 1.2778 |
3.0358 | 0.0392 | 200 | 1.2781 |
2.6713 | 0.0441 | 225 | 1.2779 |
2.8436 | 0.0490 | 250 | 1.2779 |
2.8396 | 0.0539 | 275 | 1.2779 |
2.8333 | 0.0588 | 300 | 1.2779 |
2.9845 | 0.0637 | 325 | 1.2782 |
2.6105 | 0.0686 | 350 | 1.2780 |
2.8874 | 0.0735 | 375 | 1.2781 |
2.626 | 0.0784 | 400 | 1.2779 |
2.6437 | 0.0833 | 425 | 1.2779 |
2.8082 | 0.0882 | 450 | 1.2781 |
2.7265 | 0.0931 | 475 | 1.2781 |
2.3798 | 0.0981 | 500 | 1.2779 |
2.677 | 0.1030 | 525 | 1.2781 |
2.6383 | 0.1079 | 550 | 1.2781 |
2.7663 | 0.1128 | 575 | 1.2779 |
2.6638 | 0.1177 | 600 | 1.2780 |
2.7337 | 0.1226 | 625 | 1.2779 |
2.801 | 0.1275 | 650 | 1.2781 |
2.6107 | 0.1324 | 675 | 1.2781 |
2.745 | 0.1373 | 700 | 1.2779 |
2.7384 | 0.1422 | 725 | 1.2780 |
2.7729 | 0.1471 | 750 | 1.2780 |
2.9625 | 0.1520 | 775 | 1.2781 |
2.4783 | 0.1569 | 800 | 1.2780 |
3.0425 | 0.1618 | 825 | 1.2782 |
2.6405 | 0.1667 | 850 | 1.2778 |
2.9304 | 0.1716 | 875 | 1.2780 |
2.9427 | 0.1765 | 900 | 1.2779 |
2.5965 | 0.1814 | 925 | 1.2779 |
2.8121 | 0.1863 | 950 | 1.2781 |
2.6972 | 0.1912 | 975 | 1.2780 |
2.7454 | 0.1961 | 1000 | 1.2780 |
2.6769 | 0.2010 | 1025 | 1.2779 |
3.0904 | 0.2059 | 1050 | 1.2779 |
2.9542 | 0.2108 | 1075 | 1.2779 |
2.8049 | 0.2157 | 1100 | 1.2780 |
2.668 | 0.2206 | 1125 | 1.2780 |
2.8104 | 0.2255 | 1150 | 1.2780 |
2.5648 | 0.2304 | 1175 | 1.2780 |
2.6159 | 0.2353 | 1200 | 1.2781 |
2.8947 | 0.2402 | 1225 | 1.2781 |
2.6045 | 0.2451 | 1250 | 1.2780 |
2.767 | 0.2500 | 1275 | 1.2780 |
2.9516 | 0.2549 | 1300 | 1.2778 |
2.7013 | 0.2598 | 1325 | 1.2782 |
2.9801 | 0.2647 | 1350 | 1.2779 |
2.4982 | 0.2696 | 1375 | 1.2780 |
3.023 | 0.2745 | 1400 | 1.2780 |
2.7095 | 0.2794 | 1425 | 1.2780 |
2.6959 | 0.2843 | 1450 | 1.2779 |
2.5794 | 0.2893 | 1475 | 1.2779 |
3.0798 | 0.2942 | 1500 | 1.2782 |
2.5299 | 0.2991 | 1525 | 1.2781 |
2.8247 | 0.3040 | 1550 | 1.2780 |
2.6481 | 0.3089 | 1575 | 1.2778 |
2.3977 | 0.3138 | 1600 | 1.2780 |
2.626 | 0.3187 | 1625 | 1.2778 |
2.8101 | 0.3236 | 1650 | 1.2780 |
2.7166 | 0.3285 | 1675 | 1.2780 |
2.9789 | 0.3334 | 1700 | 1.2779 |
2.9734 | 0.3383 | 1725 | 1.2780 |
2.6497 | 0.3432 | 1750 | 1.2781 |
2.7752 | 0.3481 | 1775 | 1.2780 |
3.0049 | 0.3530 | 1800 | 1.2778 |
2.7946 | 0.3579 | 1825 | 1.2778 |
2.7212 | 0.3628 | 1850 | 1.2780 |
2.7503 | 0.3677 | 1875 | 1.2778 |
2.6616 | 0.3726 | 1900 | 1.2781 |
3.1099 | 0.3775 | 1925 | 1.2781 |
2.7114 | 0.3824 | 1950 | 1.2781 |
2.6648 | 0.3873 | 1975 | 1.2781 |
2.8947 | 0.3922 | 2000 | 1.2780 |
2.5636 | 0.3971 | 2025 | 1.2780 |
2.618 | 0.4020 | 2050 | 1.2780 |
2.6153 | 0.4069 | 2075 | 1.2779 |
2.7458 | 0.4118 | 2100 | 1.2780 |
2.896 | 0.4167 | 2125 | 1.2779 |
2.9055 | 0.4216 | 2150 | 1.2781 |
2.8312 | 0.4265 | 2175 | 1.2781 |
2.6273 | 0.4314 | 2200 | 1.2781 |
2.7673 | 0.4363 | 2225 | 1.2780 |
2.887 | 0.4412 | 2250 | 1.2780 |
2.7996 | 0.4461 | 2275 | 1.2780 |
2.6026 | 0.4510 | 2300 | 1.2780 |
2.8637 | 0.4559 | 2325 | 1.2779 |
2.6673 | 0.4608 | 2350 | 1.2780 |
2.7375 | 0.4657 | 2375 | 1.2780 |
2.7014 | 0.4706 | 2400 | 1.2780 |
3.0431 | 0.4755 | 2425 | 1.2780 |
2.7895 | 0.4805 | 2450 | 1.2780 |
2.5445 | 0.4854 | 2475 | 1.2781 |
2.8042 | 0.4903 | 2500 | 1.2781 |
2.4517 | 0.4952 | 2525 | 1.2779 |
3.0145 | 0.5001 | 2550 | 1.2779 |
2.8011 | 0.5050 | 2575 | 1.2779 |
2.7895 | 0.5099 | 2600 | 1.2779 |
2.8871 | 0.5148 | 2625 | 1.2779 |
2.7724 | 0.5197 | 2650 | 1.2779 |
2.5841 | 0.5246 | 2675 | 1.2780 |
2.7891 | 0.5295 | 2700 | 1.2779 |
2.9153 | 0.5344 | 2725 | 1.2779 |
3.0127 | 0.5393 | 2750 | 1.2780 |
2.8079 | 0.5442 | 2775 | 1.2778 |
2.8522 | 0.5491 | 2800 | 1.2780 |
2.6897 | 0.5540 | 2825 | 1.2779 |
2.822 | 0.5589 | 2850 | 1.2777 |
2.8534 | 0.5638 | 2875 | 1.2778 |
2.5255 | 0.5687 | 2900 | 1.2778 |
2.6427 | 0.5736 | 2925 | 1.2780 |
3.0485 | 0.5785 | 2950 | 1.2782 |
3.0283 | 0.5834 | 2975 | 1.2779 |
2.9914 | 0.5883 | 3000 | 1.2780 |
2.9151 | 0.5932 | 3025 | 1.2783 |
2.397 | 0.5981 | 3050 | 1.2780 |
2.8832 | 0.6030 | 3075 | 1.2780 |
2.8657 | 0.6079 | 3100 | 1.2780 |
2.5352 | 0.6128 | 3125 | 1.2779 |
2.8679 | 0.6177 | 3150 | 1.2778 |
2.6386 | 0.6226 | 3175 | 1.2780 |
2.9986 | 0.6275 | 3200 | 1.2779 |
2.842 | 0.6324 | 3225 | 1.2779 |
2.6134 | 0.6373 | 3250 | 1.2780 |
2.8062 | 0.6422 | 3275 | 1.2781 |
2.8878 | 0.6471 | 3300 | 1.2780 |
2.6385 | 0.6520 | 3325 | 1.2780 |
2.7413 | 0.6569 | 3350 | 1.2780 |
2.8832 | 0.6618 | 3375 | 1.2781 |
2.782 | 0.6667 | 3400 | 1.2780 |
2.7907 | 0.6717 | 3425 | 1.2779 |
2.7367 | 0.6766 | 3450 | 1.2781 |
2.8375 | 0.6815 | 3475 | 1.2780 |
2.8279 | 0.6864 | 3500 | 1.2780 |
2.7932 | 0.6913 | 3525 | 1.2781 |
2.6823 | 0.6962 | 3550 | 1.2780 |
2.7605 | 0.7011 | 3575 | 1.2779 |
2.8804 | 0.7060 | 3600 | 1.2780 |
2.769 | 0.7109 | 3625 | 1.2781 |
2.6696 | 0.7158 | 3650 | 1.2783 |
2.7543 | 0.7207 | 3675 | 1.2781 |
2.7719 | 0.7256 | 3700 | 1.2780 |
2.7031 | 0.7305 | 3725 | 1.2782 |
2.9465 | 0.7354 | 3750 | 1.2780 |
2.9159 | 0.7403 | 3775 | 1.2780 |
2.8126 | 0.7452 | 3800 | 1.2780 |
2.8721 | 0.7501 | 3825 | 1.2781 |
2.875 | 0.7550 | 3850 | 1.2781 |
2.7 | 0.7599 | 3875 | 1.2781 |
2.8479 | 0.7648 | 3900 | 1.2782 |
2.7953 | 0.7697 | 3925 | 1.2779 |
2.7945 | 0.7746 | 3950 | 1.2778 |
2.8429 | 0.7795 | 3975 | 1.2777 |
2.8151 | 0.7844 | 4000 | 1.2779 |
2.9726 | 0.7893 | 4025 | 1.2779 |
2.6825 | 0.7942 | 4050 | 1.2781 |
2.425 | 0.7991 | 4075 | 1.2781 |
2.5809 | 0.8040 | 4100 | 1.2778 |
3.0961 | 0.8089 | 4125 | 1.2780 |
2.8323 | 0.8138 | 4150 | 1.2780 |
2.7579 | 0.8187 | 4175 | 1.2781 |
2.8227 | 0.8236 | 4200 | 1.2780 |
3.046 | 0.8285 | 4225 | 1.2780 |
2.593 | 0.8334 | 4250 | 1.2779 |
3.0566 | 0.8383 | 4275 | 1.2779 |
2.4772 | 0.8432 | 4300 | 1.2780 |
2.9035 | 0.8481 | 4325 | 1.2779 |
2.6611 | 0.8530 | 4350 | 1.2780 |
2.789 | 0.8579 | 4375 | 1.2780 |
2.5477 | 0.8629 | 4400 | 1.2781 |
2.65 | 0.8678 | 4425 | 1.2780 |
2.7394 | 0.8727 | 4450 | 1.2779 |
2.9178 | 0.8776 | 4475 | 1.2781 |
2.6875 | 0.8825 | 4500 | 1.2781 |
2.6716 | 0.8874 | 4525 | 1.2780 |
2.66 | 0.8923 | 4550 | 1.2783 |
3.0137 | 0.8972 | 4575 | 1.2780 |
2.714 | 0.9021 | 4600 | 1.2779 |
2.8224 | 0.9070 | 4625 | 1.2780 |
2.8566 | 0.9119 | 4650 | 1.2782 |
2.6979 | 0.9168 | 4675 | 1.2781 |
3.0773 | 0.9217 | 4700 | 1.2780 |
2.7923 | 0.9266 | 4725 | 1.2780 |
2.6275 | 0.9315 | 4750 | 1.2781 |
2.8812 | 0.9364 | 4775 | 1.2779 |
2.8417 | 0.9413 | 4800 | 1.2780 |
2.8717 | 0.9462 | 4825 | 1.2780 |
2.6871 | 0.9511 | 4850 | 1.2780 |
2.8382 | 0.9560 | 4875 | 1.2781 |
2.8615 | 0.9609 | 4900 | 1.2779 |
2.9204 | 0.9658 | 4925 | 1.2779 |
2.7162 | 0.9707 | 4950 | 1.2781 |
2.5257 | 0.9756 | 4975 | 1.2780 |
2.7771 | 0.9805 | 5000 | 1.2779 |
2.8008 | 0.9854 | 5025 | 1.2780 |
2.8659 | 0.9903 | 5050 | 1.2778 |
2.8939 | 0.9952 | 5075 | 1.2781 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.1.2+cu121
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for brando/tfa_output_2025_m05_d13_t13h_51m_30s
Base model
internlm/internlm2-math-plus-1_8b