pythia-160m-shuffled-pg19

This model is a fine-tuned version of yurakuratov/pythia-160m-rnd on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25

Training Loss	Epoch	Step	Validation Loss
6.9501	0.5593	250	6.8935
6.5228	1.1186	500	6.5062
6.3755	1.6779	750	6.3901
6.2914	2.2371	1000	6.3212
6.2433	2.7964	1250	6.2800
6.1989	3.3557	1500	6.2246
6.1642	3.9150	1750	6.1784
6.1303	4.4743	2000	6.1952
6.1105	5.0336	2250	6.1830
6.1022	5.5928	2500	6.1601
6.0819	6.1521	2750	6.1461
6.0729	6.7114	3000	6.1581
6.0704	7.2707	3250	6.1284
6.0564	7.8300	3500	6.1152
6.0422	8.3893	3750	6.1052
6.0272	8.9485	4000	6.1243
6.025	9.5078	4250	6.1150
6.0096	10.0671	4500	6.0761
6.0014	10.6264	4750	6.0883
6.001	11.1857	5000	6.0950
5.9984	11.7450	5250	6.0633
5.9943	12.3043	5500	6.0714
5.9836	12.8635	5750	6.0981
5.9819	13.4228	6000	6.0536
5.9825	13.9821	6250	6.0519
5.9677	14.5414	6500	6.0923
5.9645	15.1007	6750	6.0295
5.9689	15.6600	7000	6.0396
5.9667	16.2192	7250	6.0684
5.9598	16.7785	7500	6.0128
5.9414	17.3378	7750	6.0212
5.9427	17.8971	8000	6.0452
5.9403	18.4564	8250	6.0217
5.9439	19.0157	8500	6.0177
5.9404	19.5749	8750	6.0494
5.9328	20.1342	9000	5.9959
5.9344	20.6935	9250	6.0190
5.9323	21.2528	9500	5.9959
5.9273	21.8121	9750	6.0320
5.9164	22.3714	10000	6.0198
5.9237	22.9306	10250	5.9934
5.921	23.4899	10500	6.0037
5.9169	24.0492	10750	6.0041
5.9089	24.6085	11000	5.9773