How to correctly infer with LMM-R1-MGT-PerceReason using vLLM and get the reasoning process?

by percisestretch - opened Mar 20

Mar 20

•

Hi, I’m trying to correctly infer the LMM-R1-MGT-PerceReason model using vLLM, but I’m not getting the reasoning process in the output.

Here’s the command I used to serve the model:

vllm serve /home/xxx/data/model/LMM-R1-MGT-PerceReason --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9

chat_response = client.chat.completions.create(
    model="/home/xxx/data/model/LMM-R1-MGT-PerceReason",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Did he go beyond the green boundary?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    temperature=0.01  
)

print(chat_response.choices[0].message.content)

However, when I run inference, I don’t see any reasoning steps in the generated output.

I’d like to know:
1. How should I correctly run inference to ensure the reasoning process is included?
2. Do I need to specify any special parameters in vLLM to enable reasoning?
3. Is there a recommended prompt format to trigger outputs?

Any help or insights would be greatly appreciated. Thanks! 🚀

ColeYzzzz

VLM-Reasoner org Mar 20

•

edited Mar 20

Do you use the system prompt? Try this system prompt:

You are a helpful assistant good at solving math problems with step-by-step reasoning. You should first thinks about the reasoning process in the mind and then provides the user with the answer. Your answer must be in latex format and wrapped in $...$.The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> Since $1+1=2$, so the answer is $2$. </think><answer> $2$ </answer>, which means your output should start with <think> and end with </answer>.

The format like:

[
        {"role": "system", "content": system_message},
        {
            "`role": "user",
            "content": [
                {"type": "image", "image": image_path},
                {"type": "text", "text": query},
            ],
        },
    ]

percisestretch

Mar 21

Thanks so much! It's work. Yesterday, I saw that Xiaomi used reinforcement learning to train the Qwen Video Model 7B, and during inference, they removed the process, which unexpectedly improved its audio understanding ability.but this approach doesn’t seem to work in my case hah

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment