raw-xlstm

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
40.8489	0.32	100	7.6418
28.299	0.64	200	6.9708
26.2015	0.96	300	6.6436
24.2679	1.2784	400	6.4288
23.5321	1.5984	500	6.2644
22.9093	1.9184	600	6.1378
20.9784	2.2368	700	6.0831
20.5525	2.5568	800	6.0163
20.3495	2.8768	900	5.9544
18.685	3.1952	1000	5.9836
17.8091	3.5152	1100	5.9750
17.8559	3.8352	1200	5.9472
16.4337	4.1536	1300	6.0460
15.1001	4.4736	1400	6.0802
15.291	4.7936	1500	6.0832
14.2383	5.112	1600	6.2050
12.4653	5.432	1700	6.3012
12.6628	5.752	1800	6.3316
12.1045	6.0704	1900	6.4283
10.2247	6.3904	2000	6.5635
10.395	6.7104	2100	6.6127
10.1929	7.0288	2200	6.6716
8.5996	7.3488	2300	6.8063
8.6853	7.6688	2400	6.8550
8.7377	7.9888	2500	6.8878
7.5955	8.3072	2600	6.9726
7.6375	8.6272	2700	7.0046
7.6833	8.9472	2800	7.0211
7.2457	9.2656	2900	7.0432
7.2003	9.5856	3000	7.0503
7.2109	9.9056	3100	7.0517