TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images' when using processor with text and images
I'm encountering a TypeError when trying to pass both text and images to the processor for a multimodal model. The error suggests that PreTrainedTokenizerFast._batch_encode_plus() does not expect an images argument.
Qwen2VLRotaryEmbedding
can now be fully parameterized by passing the model config through the config
argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:01<00:00, 3.19it/s]
Traceback (most recent call last):
File "/mnt/olmOCR-7B-0225-preview/olmocr_backup.py", line 60, in
inputs = processor(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3021, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3109, in _call_one
return self.batch_encode_plus(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 3311, in batch_encode_plus
return self._batch_encode_plus(
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'