berel_finetuned_on_HB_10_epochs

This model is a fine-tuned version of dicta-il/BEREL on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss
6.079	0.2153	500	8.6612
8.6798	0.4307	1000	8.6493
8.6719	0.6460	1500	8.6301
8.6721	0.8613	2000	8.6263
8.5797	1.0767	2500	8.6402
8.5852	1.2920	3000	8.6847
8.5078	1.5073	3500	8.6465
8.5515	1.7227	4000	8.5885
8.5242	1.9380	4500	8.6956
8.5715	2.1533	5000	8.6018
8.5169	2.3686	5500	8.5959
8.5334	2.5840	6000	8.6440
8.52	2.7993	6500	8.6562
8.53	3.0146	7000	8.6432
8.5359	3.2300	7500	8.6783
8.4725	3.4453	8000	8.5500
8.4756	3.6606	8500	8.6706
8.4507	3.8760	9000	8.6479
8.4904	4.0913	9500	8.6086
8.4334	4.3066	10000	8.5866
8.467	4.5220	10500	8.5847
8.4697	4.7373	11000	8.6239
8.431	4.9526	11500	8.6191
8.4106	5.1680	12000	8.6032
8.4453	5.3833	12500	8.5755
8.4612	5.5986	13000	8.5904
8.4625	5.8140	13500	8.5710
8.3992	6.0293	14000	8.5509
8.3871	6.2446	14500	8.5752
8.6536	6.4599	15000	8.5860
8.4055	6.6753	15500	8.6032
8.4246	6.8906	16000	8.6231
8.4507	7.1059	16500	8.5682
8.4161	7.3213	17000	8.5964
8.356	7.5366	17500	8.5273
8.3667	7.7519	18000	8.5445
8.4002	7.9673	18500	8.5540
8.3857	8.1826	19000	8.5309
8.3508	8.3979	19500	8.6229
8.3208	8.6133	20000	8.5374
8.3711	8.8286	20500	8.5328
8.3891	9.0439	21000	8.5885
8.3473	9.2593	21500	8.5572
8.3621	9.4746	22000	8.5816
8.3778	9.6899	22500	8.4805
8.3265	9.9053	23000	8.5404