How long was this model trained?
#8
by
jploski
- opened
How many steps/epochs was this particular model trained on? And which of the datasets was used: was it https://huggingface.co/datasets/roneneldan/TinyStories/tree/main/TinyStories-train.txt?
I can only find Figure 3 in the paper showing 2.5K steps. Am I right that it translates into ~1.5 epochs using the TinyStories-train.txt dataset and parameters from the model card?
About 20 epochs. Context length 512, batch size 80 (20 per device over 4 V-100 GPUs), 16 gradient accumulation steps. Learning rate 5e-4, wd=0.1, betas 0.9,0.95. The file used to train was indeed https://huggingface.co/datasets/roneneldan/TinyStories/blob/main/TinyStories-train.txt.
Thanks for the detailed info!
jploski
changed discussion status to
closed