stanpony's picture
End of training
da43656 verified
metadata
library_name: peft
base_model: roneneldan/TinyStories-1M
tags:
  - generated_from_trainer
model-index:
  - name: test_1M_1-2025-02-16-18-59
    results: []

test_1M_1-2025-02-16-18-59

This model is a fine-tuned version of roneneldan/TinyStories-1M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2.5e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss
2.5392 0.5 297 2.4605
2.4817 1.0 594 2.4445
2.4207 1.5 891 2.4344
2.4092 2.0 1188 2.4328
2.4385 2.5 1485 2.4275
2.5104 3.0 1782 2.4149
2.3552 3.5 2079 2.4131
2.402 4.0 2376 2.4120
2.4328 4.5 2673 2.4143
2.4508 5.0 2970 2.4052
2.2452 5.5 3267 2.4064
2.5212 6.0 3564 2.4137
2.3123 6.5 3861 2.4038
2.3935 7.0 4158 2.4001
2.2864 7.5 4455 2.3967
2.3657 8.0 4752 2.3980
2.5036 8.5 5049 2.4018
2.3336 9.0 5346 2.3965
2.3799 9.5 5643 2.3916
2.478 10.0 5940 2.3979
2.3376 10.5 6237 2.3923
2.3039 11.0 6534 2.3923
2.3658 11.5 6831 2.3900
2.473 12.0 7128 2.3901
2.3923 12.5 7425 2.3869
2.4122 13.0 7722 2.3867
2.4238 13.5 8019 2.3870
2.4234 14.0 8316 2.3843
2.4062 14.5 8613 2.3869
2.3188 15.0 8910 2.3813
2.2888 15.5 9207 2.3835
2.3326 16.0 9504 2.3779
2.3273 16.5 9801 2.3807
2.3338 17.0 10098 2.3788
2.4337 17.5 10395 2.3792
2.3396 18.0 10692 2.3800
2.3172 18.5 10989 2.3806
2.3586 19.0 11286 2.3807
2.3708 19.5 11583 2.3789
2.449 20.0 11880 2.3762
2.3071 20.5 12177 2.3786
2.2589 21.0 12474 2.3750
2.2423 21.5 12771 2.3749
2.2852 22.0 13068 2.3737
2.2754 22.5 13365 2.3750
2.2977 23.0 13662 2.3737
2.2701 23.5 13959 2.3701
2.2638 24.0 14256 2.3726
2.377 24.5 14553 2.3733
2.3774 25.0 14850 2.3725
2.2137 25.5 15147 2.3722
2.3267 26.0 15444 2.3681
2.2415 26.5 15741 2.3706
2.2957 27.0 16038 2.3687
2.3003 27.5 16335 2.3678
2.3662 28.0 16632 2.3678
2.305 28.5 16929 2.3673
2.2603 29.0 17226 2.3667
2.2806 29.5 17523 2.3665
2.2674 30.0 17820 2.3658

Framework versions

  • PEFT 0.14.0
  • Transformers 4.48.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0