PAD token for Instruction Fine Tuning

#6
by MauroCE - opened

What is the correct token to set as PAD token for IFT?

Models like Llama-3.2 have a special token <|finetune_right_pad_id|> which is untrained during pre-training and can be used exactly for this purpose.

I noticed the model doesn’t have any “special reserved tokens” to be used for other tasks. Of course one could add a new special token and resize the embeddings, but this would require additional training to tune those embeddings.

Setting the EOS token as PAD token is discouraged for IFT as it can lead to endless generation.

What’s the correct approach here?

Sign up or log in to comment