Update tokenizer_config.json

#101
by Akshay47 - opened

Your current tokenizer config:

"unk_token": null

This means there's no defined "unknown token," which is risky — the tokenizer can't handle out-of-vocabulary (OOV) tokens properly.

This update defines the unk_token, enabling the tokenizer to:

  1. Prevent crashes or undefined behavior when unknown tokens are encountered.
  2. Ensure compatibility with libraries that expect a defined unk_token.
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment