Unexpected Leading sos/eos Tokens at Start of Generation

#105
by E1eMental - opened

Issue Description

When testing florence-2-large, I observed unexpected token generation behavior. Specifically:

  • Decoding starts by sending decoder_start_token_id=2 (which is also the eos_token_id).
  • The model then generates three tokens with index 0 (sos_token) before outputting any valuable tokens.
  • Notably, sos_token_id=0, and eos_token_id=2, but generation is started with the eos_token_id rather than the sos_token_id.

This leads to two questions:

  1. Why does generation begin with the eos_token_id instead of the sos_token_id?
  2. Why are multiple leading tokens with index 0 (the sos_token_id) generated before the actual output?

Finetuning & Label Construction

While finetuning Florence-2 on a custom task (using https://huggingface.co/blog/finetune-florence2), I discovered the following during debugging:

  • The target label sequences during training contain a leading 0 (sos_token_id) at the start: [0, valuable_tokens, 2].
  • As a result, the model is forced to generate a "junk" 0 token at the start of each sequence, and the loss is calculated on this token as well.
  • This might explain why the model always generates an sos (0) tokens at the beginning during inference.

Questions & Request

  • Why is token 2 (eos_token_id) used as the start of the sequence for decoding, rather than 0 (sos_token_id)?
  • Is my assumption about the leading 0 token during training correct?
  • If so, could you retrain the model (or release a new checkpoint) without this issue, so that generation is not forced to start with a junk token?

Sign up or log in to comment