chat_template failed to process image input
Thanks for open-sourcing your great work!
I would like to evaluate your OVR-7B models on MathVista and MathVision by VLLM offline model API (i.e., from vllm import LLM). However, when I load the model with vLLM and build a chat prompt via processor.apply_chat_template
as qwen2.5vl-7b does, image blocks aren’t converted into <|vision_start|><|image_pad|><|vision_end|> tokens and an error is thrown telling that list cannot concat with str
. Manually replacing processor.chat_template
with Qwen-2.5-VL’s template could make the model successfully produce responses, but I’m not sure that’s the intended approach.
Could you kindly confirm the correct way to pass images (or share an official multimodal template/snippet)? Even a minimal example would help. Thanks!
Hi, we've updated the chat template for the model to fix the issue.
Please pull the latest version from the Hub, and it should now work as expected. Thanks for bringing this to our attention!