model.config.vocab_size has a wrong value compared to the actual vocab size in the tokenizer

by Owos - opened 15 days ago

Discussion

Owos

15 days ago

model.config.vocab_size has a higher value that the total number of tokens in the tokenizer

BalakrishnaCh

Google org 14 days ago

Hi @Owos ,

Welcome to the Google Gemma family of open source models. Thanks for notifying us on the discrepancy in the tokens count and vocab size. I have escalated the above issue to our internal team.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment