en_mlm_child_13_new

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.3952 2000 7.2178
7.1622 2.7904 4000 6.0331
7.1622 4.1856 6000 5.8973
5.7182 5.5807 8000 5.8265
5.7182 6.9759 10000 5.7288
5.5212 8.3711 12000 5.6320
5.5212 9.7663 14000 5.5558
5.3806 11.1615 16000 5.5296
5.3806 12.5567 18000 5.4822
5.2715 13.9519 20000 5.4140
5.2715 15.3471 22000 5.4000
5.168 16.7422 24000 5.2602
5.168 18.1374 26000 5.0938
4.8891 19.5326 28000 4.6368
4.8891 20.9278 30000 3.9178
3.8558 22.3230 32000 3.4494
3.8558 23.7182 34000 3.2156
3.076 25.1134 36000 3.0643
3.076 26.5085 38000 2.9367
2.7839 27.9037 40000 2.8451
2.7839 29.2989 42000 2.7795
2.5968 30.6941 44000 2.7031
2.5968 32.0893 46000 2.6234
2.454 33.4845 48000 2.5663
2.454 34.8797 50000 2.5217
2.3492 36.2749 52000 2.4736
2.3492 37.6700 54000 2.4415
2.2704 39.0652 56000 2.4258
2.2704 40.4604 58000 2.3983
2.1991 41.8556 60000 2.3774
2.1991 43.2508 62000 2.3279
2.1443 44.6460 64000 2.3218
2.1443 46.0412 66000 2.2995
2.0997 47.4363 68000 2.2655
2.0997 48.8315 70000 2.2534
2.0561 50.2267 72000 2.2509
2.0561 51.6219 74000 2.2346
2.0224 53.0171 76000 2.2066
2.0224 54.4123 78000 2.2125
1.9954 55.8075 80000 2.2034
1.9954 57.2027 82000 2.1744
1.9698 58.5978 84000 2.1743
1.9698 59.9930 86000 2.1597
1.9443 61.3882 88000 2.1533
1.9443 62.7834 90000 2.1369
1.9283 64.1786 92000 2.1471
1.9283 65.5738 94000 2.1129
1.9124 66.9690 96000 2.1441
1.9124 68.3641 98000 2.1378
1.9004 69.7593 100000 2.1155

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
12
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
Examples
Mask token: <mask>