modernbert-dllm-tulu

This model is a fine-tuned version of answerdotai/ModernBERT-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
No log 0.0332 200 1.7948
No log 0.0664 400 1.7504
1.7964 0.0997 600 1.7230
1.7964 0.1329 800 1.7046
1.717 0.1661 1000 1.6923
1.717 0.1993 1200 1.6827
1.717 0.2326 1400 1.6752
1.6662 0.2658 1600 1.6689
1.6662 0.2990 1800 1.6638
1.6667 0.3322 2000 1.6601
1.6667 0.3654 2200 1.6574
1.6667 0.3987 2400 1.6544
1.6626 0.4319 2600 1.6525
1.6626 0.4651 2800 1.6505
1.6472 0.4983 3000 1.6493
1.6472 0.5316 3200 1.6479
1.6472 0.5648 3400 1.6469
1.6354 0.5980 3600 1.6460
1.6354 0.6312 3800 1.6454
1.6457 0.6645 4000 1.6448
1.6457 0.6977 4200 1.6445
1.6457 0.7309 4400 1.6440
1.6404 0.7641 4600 1.6437
1.6404 0.7973 4800 1.6436
1.6472 0.8306 5000 1.6435
1.6472 0.8638 5200 1.6434
1.6472 0.8970 5400 1.6433
1.6394 0.9302 5600 1.6433
1.6394 0.9635 5800 1.6432
1.6313 0.9967 6000 1.6432

Framework versions

  • Transformers 4.53.0
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
26
Safetensors
Model size
396M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tommyp111/modernbert-dllm-tulu

Finetuned
(145)
this model

Collection including tommyp111/modernbert-dllm-tulu