Question about the size of training tokens

#34
by feiyulv - opened

Hi, the starcoder paper says we use 1 trillion tokens to train the model, but in starcoder dataset, it says the total starcoder dataset is about 250 billion tokens. How to understand the difference? Is that caused by the FIM strategy?

The model was trained for multiple epochs to reach 1 trillion tokens.

Thank you

feiyulv changed discussion status to closed

Sign up or log in to comment