Kangheng/OVR-7B-RL · chat_template failed to process image input

Thanks for open-sourcing your great work!

I would like to evaluate your OVR-7B models on MathVista and MathVision by VLLM offline model API (i.e., from vllm import LLM). However, when I load the model with vLLM and build a chat prompt via processor.apply_chat_template as qwen2.5vl-7b does, image blocks aren’t converted into <|vision_start|><|image_pad|><|vision_end|> tokens and an error is thrown telling that list cannot concat with str. Manually replacing processor.chat_template with Qwen-2.5-VL’s template could make the model successfully produce responses, but I’m not sure that’s the intended approach.

Could you kindly confirm the correct way to pass images (or share an official multimodal template/snippet)? Even a minimal example would help. Thanks!