File size: 4,585 Bytes
da43656
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
library_name: peft
base_model: roneneldan/TinyStories-1M
tags:
- generated_from_trainer
model-index:
- name: test_1M_1-2025-02-16-18-59
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# test_1M_1-2025-02-16-18-59

This model is a fine-tuned version of [roneneldan/TinyStories-1M](https://huggingface.co/roneneldan/TinyStories-1M) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 2.3658

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2.5e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1
- num_epochs: 30

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 2.5392        | 0.5   | 297   | 2.4605          |
| 2.4817        | 1.0   | 594   | 2.4445          |
| 2.4207        | 1.5   | 891   | 2.4344          |
| 2.4092        | 2.0   | 1188  | 2.4328          |
| 2.4385        | 2.5   | 1485  | 2.4275          |
| 2.5104        | 3.0   | 1782  | 2.4149          |
| 2.3552        | 3.5   | 2079  | 2.4131          |
| 2.402         | 4.0   | 2376  | 2.4120          |
| 2.4328        | 4.5   | 2673  | 2.4143          |
| 2.4508        | 5.0   | 2970  | 2.4052          |
| 2.2452        | 5.5   | 3267  | 2.4064          |
| 2.5212        | 6.0   | 3564  | 2.4137          |
| 2.3123        | 6.5   | 3861  | 2.4038          |
| 2.3935        | 7.0   | 4158  | 2.4001          |
| 2.2864        | 7.5   | 4455  | 2.3967          |
| 2.3657        | 8.0   | 4752  | 2.3980          |
| 2.5036        | 8.5   | 5049  | 2.4018          |
| 2.3336        | 9.0   | 5346  | 2.3965          |
| 2.3799        | 9.5   | 5643  | 2.3916          |
| 2.478         | 10.0  | 5940  | 2.3979          |
| 2.3376        | 10.5  | 6237  | 2.3923          |
| 2.3039        | 11.0  | 6534  | 2.3923          |
| 2.3658        | 11.5  | 6831  | 2.3900          |
| 2.473         | 12.0  | 7128  | 2.3901          |
| 2.3923        | 12.5  | 7425  | 2.3869          |
| 2.4122        | 13.0  | 7722  | 2.3867          |
| 2.4238        | 13.5  | 8019  | 2.3870          |
| 2.4234        | 14.0  | 8316  | 2.3843          |
| 2.4062        | 14.5  | 8613  | 2.3869          |
| 2.3188        | 15.0  | 8910  | 2.3813          |
| 2.2888        | 15.5  | 9207  | 2.3835          |
| 2.3326        | 16.0  | 9504  | 2.3779          |
| 2.3273        | 16.5  | 9801  | 2.3807          |
| 2.3338        | 17.0  | 10098 | 2.3788          |
| 2.4337        | 17.5  | 10395 | 2.3792          |
| 2.3396        | 18.0  | 10692 | 2.3800          |
| 2.3172        | 18.5  | 10989 | 2.3806          |
| 2.3586        | 19.0  | 11286 | 2.3807          |
| 2.3708        | 19.5  | 11583 | 2.3789          |
| 2.449         | 20.0  | 11880 | 2.3762          |
| 2.3071        | 20.5  | 12177 | 2.3786          |
| 2.2589        | 21.0  | 12474 | 2.3750          |
| 2.2423        | 21.5  | 12771 | 2.3749          |
| 2.2852        | 22.0  | 13068 | 2.3737          |
| 2.2754        | 22.5  | 13365 | 2.3750          |
| 2.2977        | 23.0  | 13662 | 2.3737          |
| 2.2701        | 23.5  | 13959 | 2.3701          |
| 2.2638        | 24.0  | 14256 | 2.3726          |
| 2.377         | 24.5  | 14553 | 2.3733          |
| 2.3774        | 25.0  | 14850 | 2.3725          |
| 2.2137        | 25.5  | 15147 | 2.3722          |
| 2.3267        | 26.0  | 15444 | 2.3681          |
| 2.2415        | 26.5  | 15741 | 2.3706          |
| 2.2957        | 27.0  | 16038 | 2.3687          |
| 2.3003        | 27.5  | 16335 | 2.3678          |
| 2.3662        | 28.0  | 16632 | 2.3678          |
| 2.305         | 28.5  | 16929 | 2.3673          |
| 2.2603        | 29.0  | 17226 | 2.3667          |
| 2.2806        | 29.5  | 17523 | 2.3665          |
| 2.2674        | 30.0  | 17820 | 2.3658          |


### Framework versions

- PEFT 0.14.0
- Transformers 4.48.1
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0