TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images' when using processor with text and images

#19
by Liang123Kun - opened

I'm encountering a TypeError when trying to pass both text and images to the processor for a multimodal model. The error suggests that PreTrainedTokenizerFast._batch_encode_plus() does not expect an images argument.

Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:01<00:00, 3.19it/s]
Traceback (most recent call last):
File "/mnt/olmOCR-7B-0225-preview/olmocr_backup.py", line 60, in
inputs = processor(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3021, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3109, in _call_one
return self.batch_encode_plus(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3311, in batch_encode_plus
return self._batch_encode_plus(
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment