raw-xlstm

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 7.0517

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
40.8489 0.32 100 7.6418
28.299 0.64 200 6.9708
26.2015 0.96 300 6.6436
24.2679 1.2784 400 6.4288
23.5321 1.5984 500 6.2644
22.9093 1.9184 600 6.1378
20.9784 2.2368 700 6.0831
20.5525 2.5568 800 6.0163
20.3495 2.8768 900 5.9544
18.685 3.1952 1000 5.9836
17.8091 3.5152 1100 5.9750
17.8559 3.8352 1200 5.9472
16.4337 4.1536 1300 6.0460
15.1001 4.4736 1400 6.0802
15.291 4.7936 1500 6.0832
14.2383 5.112 1600 6.2050
12.4653 5.432 1700 6.3012
12.6628 5.752 1800 6.3316
12.1045 6.0704 1900 6.4283
10.2247 6.3904 2000 6.5635
10.395 6.7104 2100 6.6127
10.1929 7.0288 2200 6.6716
8.5996 7.3488 2300 6.8063
8.6853 7.6688 2400 6.8550
8.7377 7.9888 2500 6.8878
7.5955 8.3072 2600 6.9726
7.6375 8.6272 2700 7.0046
7.6833 8.9472 2800 7.0211
7.2457 9.2656 2900 7.0432
7.2003 9.5856 3000 7.0503
7.2109 9.9056 3100 7.0517

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
42
Safetensors
Model size
551M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .