Is the tokenizer missing settings?
#4
by
cmcmaster
- opened
Having trouble finetuning the Galactica models. In particular, the tokenizer seems to be missing things like a defined padding token "[PAD]". See: https://github.com/paperswithcode/galai/blob/f056e1ad791f994428ca81e25683ed9656b6958f/galai/model.py#L85
Here is a great article by Patrick von Platen (Huggingface) which does an excellent job explaining the details for another LLM (Bloom):
https://huggingface.co/blog/how-to-generate