tfa_output_2025_m05_d13_t13h_51m_30s

This model is a fine-tuned version of internlm/internlm2-math-plus-1_8b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 1.2781
2.6109 0.0049 25 1.2778
2.6952 0.0098 50 1.2779
2.8036 0.0147 75 1.2780
2.8808 0.0196 100 1.2780
3.0941 0.0245 125 1.2781
2.4797 0.0294 150 1.2779
2.7034 0.0343 175 1.2778
3.0358 0.0392 200 1.2781
2.6713 0.0441 225 1.2779
2.8436 0.0490 250 1.2779
2.8396 0.0539 275 1.2779
2.8333 0.0588 300 1.2779
2.9845 0.0637 325 1.2782
2.6105 0.0686 350 1.2780
2.8874 0.0735 375 1.2781
2.626 0.0784 400 1.2779
2.6437 0.0833 425 1.2779
2.8082 0.0882 450 1.2781
2.7265 0.0931 475 1.2781
2.3798 0.0981 500 1.2779
2.677 0.1030 525 1.2781
2.6383 0.1079 550 1.2781
2.7663 0.1128 575 1.2779
2.6638 0.1177 600 1.2780
2.7337 0.1226 625 1.2779
2.801 0.1275 650 1.2781
2.6107 0.1324 675 1.2781
2.745 0.1373 700 1.2779
2.7384 0.1422 725 1.2780
2.7729 0.1471 750 1.2780
2.9625 0.1520 775 1.2781
2.4783 0.1569 800 1.2780
3.0425 0.1618 825 1.2782
2.6405 0.1667 850 1.2778
2.9304 0.1716 875 1.2780
2.9427 0.1765 900 1.2779
2.5965 0.1814 925 1.2779
2.8121 0.1863 950 1.2781
2.6972 0.1912 975 1.2780
2.7454 0.1961 1000 1.2780
2.6769 0.2010 1025 1.2779
3.0904 0.2059 1050 1.2779
2.9542 0.2108 1075 1.2779
2.8049 0.2157 1100 1.2780
2.668 0.2206 1125 1.2780
2.8104 0.2255 1150 1.2780
2.5648 0.2304 1175 1.2780
2.6159 0.2353 1200 1.2781
2.8947 0.2402 1225 1.2781
2.6045 0.2451 1250 1.2780
2.767 0.2500 1275 1.2780
2.9516 0.2549 1300 1.2778
2.7013 0.2598 1325 1.2782
2.9801 0.2647 1350 1.2779
2.4982 0.2696 1375 1.2780
3.023 0.2745 1400 1.2780
2.7095 0.2794 1425 1.2780
2.6959 0.2843 1450 1.2779
2.5794 0.2893 1475 1.2779
3.0798 0.2942 1500 1.2782
2.5299 0.2991 1525 1.2781
2.8247 0.3040 1550 1.2780
2.6481 0.3089 1575 1.2778
2.3977 0.3138 1600 1.2780
2.626 0.3187 1625 1.2778
2.8101 0.3236 1650 1.2780
2.7166 0.3285 1675 1.2780
2.9789 0.3334 1700 1.2779
2.9734 0.3383 1725 1.2780
2.6497 0.3432 1750 1.2781
2.7752 0.3481 1775 1.2780
3.0049 0.3530 1800 1.2778
2.7946 0.3579 1825 1.2778
2.7212 0.3628 1850 1.2780
2.7503 0.3677 1875 1.2778
2.6616 0.3726 1900 1.2781
3.1099 0.3775 1925 1.2781
2.7114 0.3824 1950 1.2781
2.6648 0.3873 1975 1.2781
2.8947 0.3922 2000 1.2780
2.5636 0.3971 2025 1.2780
2.618 0.4020 2050 1.2780
2.6153 0.4069 2075 1.2779
2.7458 0.4118 2100 1.2780
2.896 0.4167 2125 1.2779
2.9055 0.4216 2150 1.2781
2.8312 0.4265 2175 1.2781
2.6273 0.4314 2200 1.2781
2.7673 0.4363 2225 1.2780
2.887 0.4412 2250 1.2780
2.7996 0.4461 2275 1.2780
2.6026 0.4510 2300 1.2780
2.8637 0.4559 2325 1.2779
2.6673 0.4608 2350 1.2780
2.7375 0.4657 2375 1.2780
2.7014 0.4706 2400 1.2780
3.0431 0.4755 2425 1.2780
2.7895 0.4805 2450 1.2780
2.5445 0.4854 2475 1.2781
2.8042 0.4903 2500 1.2781
2.4517 0.4952 2525 1.2779
3.0145 0.5001 2550 1.2779
2.8011 0.5050 2575 1.2779
2.7895 0.5099 2600 1.2779
2.8871 0.5148 2625 1.2779
2.7724 0.5197 2650 1.2779
2.5841 0.5246 2675 1.2780
2.7891 0.5295 2700 1.2779
2.9153 0.5344 2725 1.2779
3.0127 0.5393 2750 1.2780
2.8079 0.5442 2775 1.2778
2.8522 0.5491 2800 1.2780
2.6897 0.5540 2825 1.2779
2.822 0.5589 2850 1.2777
2.8534 0.5638 2875 1.2778
2.5255 0.5687 2900 1.2778
2.6427 0.5736 2925 1.2780
3.0485 0.5785 2950 1.2782
3.0283 0.5834 2975 1.2779
2.9914 0.5883 3000 1.2780
2.9151 0.5932 3025 1.2783
2.397 0.5981 3050 1.2780
2.8832 0.6030 3075 1.2780
2.8657 0.6079 3100 1.2780
2.5352 0.6128 3125 1.2779
2.8679 0.6177 3150 1.2778
2.6386 0.6226 3175 1.2780
2.9986 0.6275 3200 1.2779
2.842 0.6324 3225 1.2779
2.6134 0.6373 3250 1.2780
2.8062 0.6422 3275 1.2781
2.8878 0.6471 3300 1.2780
2.6385 0.6520 3325 1.2780
2.7413 0.6569 3350 1.2780
2.8832 0.6618 3375 1.2781
2.782 0.6667 3400 1.2780
2.7907 0.6717 3425 1.2779
2.7367 0.6766 3450 1.2781
2.8375 0.6815 3475 1.2780
2.8279 0.6864 3500 1.2780
2.7932 0.6913 3525 1.2781
2.6823 0.6962 3550 1.2780
2.7605 0.7011 3575 1.2779
2.8804 0.7060 3600 1.2780
2.769 0.7109 3625 1.2781
2.6696 0.7158 3650 1.2783
2.7543 0.7207 3675 1.2781
2.7719 0.7256 3700 1.2780
2.7031 0.7305 3725 1.2782
2.9465 0.7354 3750 1.2780
2.9159 0.7403 3775 1.2780
2.8126 0.7452 3800 1.2780
2.8721 0.7501 3825 1.2781
2.875 0.7550 3850 1.2781
2.7 0.7599 3875 1.2781
2.8479 0.7648 3900 1.2782
2.7953 0.7697 3925 1.2779
2.7945 0.7746 3950 1.2778
2.8429 0.7795 3975 1.2777
2.8151 0.7844 4000 1.2779
2.9726 0.7893 4025 1.2779
2.6825 0.7942 4050 1.2781
2.425 0.7991 4075 1.2781
2.5809 0.8040 4100 1.2778
3.0961 0.8089 4125 1.2780
2.8323 0.8138 4150 1.2780
2.7579 0.8187 4175 1.2781
2.8227 0.8236 4200 1.2780
3.046 0.8285 4225 1.2780
2.593 0.8334 4250 1.2779
3.0566 0.8383 4275 1.2779
2.4772 0.8432 4300 1.2780
2.9035 0.8481 4325 1.2779
2.6611 0.8530 4350 1.2780
2.789 0.8579 4375 1.2780
2.5477 0.8629 4400 1.2781
2.65 0.8678 4425 1.2780
2.7394 0.8727 4450 1.2779
2.9178 0.8776 4475 1.2781
2.6875 0.8825 4500 1.2781
2.6716 0.8874 4525 1.2780
2.66 0.8923 4550 1.2783
3.0137 0.8972 4575 1.2780
2.714 0.9021 4600 1.2779
2.8224 0.9070 4625 1.2780
2.8566 0.9119 4650 1.2782
2.6979 0.9168 4675 1.2781
3.0773 0.9217 4700 1.2780
2.7923 0.9266 4725 1.2780
2.6275 0.9315 4750 1.2781
2.8812 0.9364 4775 1.2779
2.8417 0.9413 4800 1.2780
2.8717 0.9462 4825 1.2780
2.6871 0.9511 4850 1.2780
2.8382 0.9560 4875 1.2781
2.8615 0.9609 4900 1.2779
2.9204 0.9658 4925 1.2779
2.7162 0.9707 4950 1.2781
2.5257 0.9756 4975 1.2780
2.7771 0.9805 5000 1.2779
2.8008 0.9854 5025 1.2780
2.8659 0.9903 5050 1.2778
2.8939 0.9952 5075 1.2781

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.1.2+cu121
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
9
Safetensors
Model size
1.89B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for brando/tfa_output_2025_m05_d13_t13h_51m_30s

Finetuned
(14)
this model