grader_classifier

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 40

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.2227	1.0	1104	0.1233	0.9644
0.0966	2.0	2208	0.0647	0.9825
0.0609	3.0	3312	0.0496	0.9910
0.0378	4.0	4416	0.0548	0.9905
0.0309	5.0	5520	0.0420	0.9915
0.023	6.0	6624	0.0621	0.9895
0.0314	7.0	7728	0.0583	0.9910
0.0219	8.0	8832	0.0577	0.9895
0.0131	9.0	9936	0.0810	0.9865
0.0074	10.0	11040	0.0200	0.9965
0.0118	11.0	12144	0.0158	0.9970
0.0125	12.0	13248	0.0288	0.9955
0.01	13.0	14352	0.0319	0.9955
0.0089	14.0	15456	0.0179	0.9965
0.009	15.0	16560	0.0144	0.9980
0.0083	16.0	17664	0.0188	0.9965
0.0054	17.0	18768	0.0270	0.9950
0.0067	18.0	19872	0.0257	0.9955
0.0052	19.0	20976	0.0154	0.9975
0.0034	20.0	22080	0.0208	0.9975
0.0042	21.0	23184	0.0194	0.9970
0.0043	22.0	24288	0.0177	0.9970
0.002	23.0	25392	0.0293	0.9965
0.0105	24.0	26496	0.0272	0.9965
0.0017	25.0	27600	0.0254	0.9965
0.0067	26.0	28704	0.0163	0.9975
0.0054	27.0	29808	0.0100	0.9980
0.0012	28.0	30912	0.0160	0.9970
0.0016	29.0	32016	0.0104	0.9985
0.0009	30.0	33120	0.0107	0.9985
0.0005	31.0	34224	0.0110	0.9980
0.0006	32.0	35328	0.0135	0.9975
0.0007	33.0	36432	0.0122	0.9975
0.0001	34.0	37536	0.0131	0.9985
0.0005	35.0	38640	0.0053	0.9985
0.0002	36.0	39744	0.0116	0.9980
0.0	37.0	40848	0.0110	0.9985
0.0002	38.0	41952	0.0105	0.9980
0.0003	39.0	43056	0.0107	0.9980
0.0002	40.0	44160	0.0109	0.9980