Fix BaichuanTokenizer to fit transformers>=4.34
#8
by
xu-song
- opened
$ python predict_baichuan.py
Traceback (most recent call last):
File "/workspace/baichuan/predict/predict_baichuan.py", line 14, in <module>
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False, trust_remote_code=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 755, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py", line 75, in __init__
super().__init__(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 367, in __init__
self._add_tokens(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py", line 109, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py", line 105, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'
Related issue https://github.com/InternLM/InternLM/pull/419/files
This comment has been hidden
GradientGuru
changed pull request status to
merged