modernbert-flowlm-tulu

This model is a fine-tuned version of answerdotai/ModernBERT-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6810

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss
No log 0.0332 200 1.9389
No log 0.0664 400 1.8775
1.9745 0.0997 600 1.8439
1.9745 0.1329 800 1.8225
1.8208 0.1661 1000 1.8054
1.8208 0.1993 1200 1.7931
1.8208 0.2326 1400 1.7823
1.7755 0.2658 1600 1.7738
1.7755 0.2990 1800 1.7662
1.754 0.3322 2000 1.7595
1.754 0.3654 2200 1.7548
1.754 0.3987 2400 1.7496
1.748 0.4319 2600 1.7454
1.748 0.4651 2800 1.7418
1.7386 0.4983 3000 1.7375
1.7386 0.5316 3200 1.7339
1.7386 0.5648 3400 1.7309
1.7238 0.5980 3600 1.7283
1.7238 0.6312 3800 1.7255
1.7212 0.6645 4000 1.7230
1.7212 0.6977 4200 1.7210
1.7212 0.7309 4400 1.7186
1.7097 0.7641 4600 1.7161
1.7097 0.7973 4800 1.7144
1.6998 0.8306 5000 1.7128
1.6998 0.8638 5200 1.7110
1.6998 0.8970 5400 1.7095
1.7027 0.9302 5600 1.7085
1.7027 0.9635 5800 1.7069
1.7137 0.9967 6000 1.7053
1.7137 1.0299 6200 1.7043
1.7137 1.0631 6400 1.7031
1.7008 1.0963 6600 1.7021
1.7008 1.1296 6800 1.7009
1.6911 1.1628 7000 1.7000
1.6911 1.1960 7200 1.6990
1.6911 1.2292 7400 1.6979
1.6869 1.2625 7600 1.6971
1.6869 1.2957 7800 1.6963
1.6845 1.3289 8000 1.6959
1.6845 1.3621 8200 1.6952
1.6845 1.3953 8400 1.6943
1.6849 1.4286 8600 1.6936
1.6849 1.4618 8800 1.6931
1.6774 1.4950 9000 1.6924
1.6774 1.5282 9200 1.6919
1.6774 1.5615 9400 1.6915
1.6595 1.5947 9600 1.6908
1.6595 1.6279 9800 1.6905
1.6812 1.6611 10000 1.6901
1.6812 1.6944 10200 1.6893
1.6812 1.7276 10400 1.6892
1.681 1.7608 10600 1.6889
1.681 1.7940 10800 1.6882
1.6775 1.8272 11000 1.6877
1.6775 1.8605 11200 1.6875
1.6775 1.8937 11400 1.6874
1.6709 1.9269 11600 1.6868
1.6709 1.9601 11800 1.6865
1.6713 1.9934 12000 1.6864
1.6713 2.0266 12200 1.6861
1.6713 2.0598 12400 1.6859
1.6751 2.0930 12600 1.6857
1.6751 2.1262 12800 1.6856
1.6711 2.1595 13000 1.6852
1.6711 2.1927 13200 1.6851
1.6711 2.2259 13400 1.6847
1.6688 2.2591 13600 1.6845
1.6688 2.2924 13800 1.6845
1.6772 2.3256 14000 1.6843
1.6772 2.3588 14200 1.6840
1.6772 2.3920 14400 1.6838
1.6736 2.4252 14600 1.6838
1.6736 2.4585 14800 1.6835
1.6706 2.4917 15000 1.6834
1.6706 2.5249 15200 1.6833
1.6706 2.5581 15400 1.6832
1.6875 2.5914 15600 1.6831
1.6875 2.6246 15800 1.6830
1.6768 2.6578 16000 1.6830
1.6768 2.6910 16200 1.6828
1.6768 2.7243 16400 1.6827
1.6687 2.7575 16600 1.6825
1.6687 2.7907 16800 1.6824
1.6825 2.8239 17000 1.6824
1.6825 2.8571 17200 1.6823
1.6825 2.8904 17400 1.6823
1.659 2.9236 17600 1.6821
1.659 2.9568 17800 1.6821
1.6602 2.9900 18000 1.6821
1.6602 3.0233 18200 1.6820
1.6602 3.0565 18400 1.6819
1.6733 3.0897 18600 1.6818
1.6733 3.1229 18800 1.6818
1.6549 3.1561 19000 1.6818
1.6549 3.1894 19200 1.6818
1.6549 3.2226 19400 1.6817
1.6702 3.2558 19600 1.6817
1.6702 3.2890 19800 1.6816
1.6834 3.3223 20000 1.6816
1.6834 3.3555 20200 1.6816
1.6834 3.3887 20400 1.6816
1.6614 3.4219 20600 1.6814
1.6614 3.4551 20800 1.6815
1.6807 3.4884 21000 1.6814
1.6807 3.5216 21200 1.6814
1.6807 3.5548 21400 1.6814
1.6731 3.5880 21600 1.6813
1.6731 3.6213 21800 1.6813
1.6742 3.6545 22000 1.6813
1.6742 3.6877 22200 1.6812
1.6742 3.7209 22400 1.6812
1.6676 3.7542 22600 1.6812
1.6676 3.7874 22800 1.6812
1.6521 3.8206 23000 1.6812
1.6521 3.8538 23200 1.6812
1.6521 3.8870 23400 1.6812
1.6715 3.9203 23600 1.6812
1.6715 3.9535 23800 1.6812
1.6681 3.9867 24000 1.6811
1.6681 4.0199 24200 1.6811
1.6681 4.0532 24400 1.6811
1.6582 4.0864 24600 1.6811
1.6582 4.1196 24800 1.6811
1.6742 4.1528 25000 1.6810
1.6742 4.1860 25200 1.6810
1.6742 4.2193 25400 1.6810
1.6789 4.2525 25600 1.6811
1.6789 4.2857 25800 1.6810
1.6629 4.3189 26000 1.6810
1.6629 4.3522 26200 1.6811
1.6629 4.3854 26400 1.6810
1.6597 4.4186 26600 1.6810
1.6597 4.4518 26800 1.6810
1.6652 4.4850 27000 1.6810
1.6652 4.5183 27200 1.6810
1.6652 4.5515 27400 1.6810
1.6695 4.5847 27600 1.6810
1.6695 4.6179 27800 1.6810
1.6708 4.6512 28000 1.6810
1.6708 4.6844 28200 1.6810
1.6708 4.7176 28400 1.6810
1.6652 4.7508 28600 1.6810
1.6652 4.7841 28800 1.6810
1.6595 4.8173 29000 1.6810
1.6595 4.8505 29200 1.6810
1.6595 4.8837 29400 1.6810
1.6703 4.9169 29600 1.6810
1.6703 4.9502 29800 1.6810
1.6695 4.9834 30000 1.6810
1.6695 5.0166 30200 1.6810
1.6695 5.0498 30400 1.6810
1.6569 5.0831 30600 1.6810
1.6569 5.1163 30800 1.6810
1.6733 5.1495 31000 1.6810
1.6733 5.1827 31200 1.6810
1.6733 5.2159 31400 1.6810
1.6808 5.2492 31600 1.6810
1.6808 5.2824 31800 1.6810
1.6678 5.3156 32000 1.6810
1.6678 5.3488 32200 1.6810
1.6678 5.3821 32400 1.6810
1.6737 5.4153 32600 1.6810
1.6737 5.4485 32800 1.6810
1.6751 5.4817 33000 1.6810
1.6751 5.5150 33200 1.6810
1.6751 5.5482 33400 1.6810
1.6709 5.5814 33600 1.6810
1.6709 5.6146 33800 1.6810
1.657 5.6478 34000 1.6810
1.657 5.6811 34200 1.6810
1.657 5.7143 34400 1.6810
1.6678 5.7475 34600 1.6810
1.6678 5.7807 34800 1.6810
1.6635 5.8140 35000 1.6810
1.6635 5.8472 35200 1.6810
1.6635 5.8804 35400 1.6810
1.6781 5.9136 35600 1.6810
1.6781 5.9468 35800 1.6810
1.6722 5.9801 36000 1.6810

Framework versions

  • Transformers 4.53.0
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
91
Safetensors
Model size
396M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tommyp111/modernbert-flowlm-tulu

Finetuned
(145)
this model

Collection including tommyp111/modernbert-flowlm-tulu