modernbert-dllm-tulu

This model is a fine-tuned version of answerdotai/ModernBERT-large on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 128
total_eval_batch_size: 128
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
No log	0.0332	200	1.7948
No log	0.0664	400	1.7504
1.7964	0.0997	600	1.7230
1.7964	0.1329	800	1.7046
1.717	0.1661	1000	1.6923
1.717	0.1993	1200	1.6827
1.717	0.2326	1400	1.6752
1.6662	0.2658	1600	1.6689
1.6662	0.2990	1800	1.6638
1.6667	0.3322	2000	1.6601
1.6667	0.3654	2200	1.6574
1.6667	0.3987	2400	1.6544
1.6626	0.4319	2600	1.6525
1.6626	0.4651	2800	1.6505
1.6472	0.4983	3000	1.6493
1.6472	0.5316	3200	1.6479
1.6472	0.5648	3400	1.6469
1.6354	0.5980	3600	1.6460
1.6354	0.6312	3800	1.6454
1.6457	0.6645	4000	1.6448
1.6457	0.6977	4200	1.6445
1.6457	0.7309	4400	1.6440
1.6404	0.7641	4600	1.6437
1.6404	0.7973	4800	1.6436
1.6472	0.8306	5000	1.6435
1.6472	0.8638	5200	1.6434
1.6472	0.8970	5400	1.6433
1.6394	0.9302	5600	1.6433
1.6394	0.9635	5800	1.6432
1.6313	0.9967	6000	1.6432