Question about the size of training tokens
#34
by
feiyulv
- opened
Hi, the starcoder paper says we use 1 trillion tokens to train the model, but in starcoder dataset, it says the total starcoder dataset is about 250 billion tokens. How to understand the difference? Is that caused by the FIM strategy?
The model was trained for multiple epochs to reach 1 trillion tokens.
Thank you
feiyulv
changed discussion status to
closed