Tokenizer not compatible with TGI — breaks on model load

#25
by pavlonator - opened

Hello Mistral team,

First off thank you for the great work you are doing!

Now the issue:
Your model mistralai/Ministral-8B-Instruct-2410 fails to load inside the Hugging Face TGI (Text Generation Inference) container due to a tokenizer deserialization issue.

Inside the TGI runtime (Rust backend), the tokenizer throws:

Exception: data did not match any variant of untagged enum ModelWrapper at line 1217944 column 3

This appears to come from tokenizer.json being incompatible with the tokenizers Rust crate used in TGI. The deserializer fails with an unrecognized or malformed structure.

However, the same tokenizer loads fine in Python using transformers:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Ministral-8B-Instruct-2410",
    trust_remote_code=True
)

✅ What we've tried:

  • TGI container version: ghcr.io/huggingface/text-generation-inference:1.4
  • GPU: NVIDIA T4 / A100 (GCP Vertex AI)
  • HF token: Valid and working
  • Model weights: Downloaded and verified
  • Tokenizer: Fails only inside TGI, not in Python

🔗 Related Issue filed in TGI

I’ve opened a detailed issue with Hugging Face’s TGI team here:
👉 https://github.com/huggingface/text-generation-inference/issues/3163


🙏 Request

Could you:

  • Confirm whether the tokenizer format is meant to work with TGI?
  • Provide a compatible tokenizer.json or .tokenizer file?
  • Or clarify which inference backends are supported for the 8B Instruct model?

Thank you for the amazing models and your work with the community! 🙏


Let me know once it’s live, and I can help boost or cross-post if needed.

Sign up or log in to comment