vit_focus

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss	Mse	Mae
No log	1.0	25	0.0685	0.1397	0.3276
0.2799	2.0	50	0.0614	0.1327	0.3184
0.2799	3.0	75	0.0575	0.1317	0.3171
0.2134	4.0	100	0.0683	0.1370	0.3236
0.2018	5.0	125	0.0610	0.1353	0.3213
0.2018	6.0	150	0.0596	0.1295	0.3133
0.1714	7.0	175	0.0588	0.1327	0.3186
0.1589	8.0	200	0.0621	0.1348	0.3204
0.1589	9.0	225	0.0615	0.1306	0.3157
0.1381	10.0	250	0.0557	0.1280	0.3118
0.1381	11.0	275	0.0580	0.1311	0.3158
0.1229	12.0	300	0.0563	0.1294	0.3139
0.1112	13.0	325	0.0629	0.1393	0.3253
0.1112	14.0	350	0.0605	0.1290	0.3128
0.0999	15.0	375	0.0604	0.1248	0.3083
0.0896	16.0	400	0.0556	0.1308	0.3153
0.0896	17.0	425	0.0610	0.1347	0.3201
0.0776	18.0	450	0.0574	0.1259	0.3093
0.0776	19.0	475	0.0584	0.1253	0.3085
0.069	20.0	500	0.0595	0.1265	0.3097
0.0649	21.0	525	0.0576	0.1308	0.3150
0.0649	22.0	550	0.0574	0.1274	0.3109
0.056	23.0	575	0.0578	0.1307	0.3149
0.0508	24.0	600	0.0563	0.1296	0.3139
0.0508	25.0	625	0.0568	0.1312	0.3157
0.0468	26.0	650	0.0578	0.1287	0.3123
0.0468	27.0	675	0.0579	0.1305	0.3147
0.0432	28.0	700	0.0572	0.1301	0.3143
0.0419	28.8247	720	0.0580	0.1308	0.3150

Safetensors

Model size

24.3M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support