How to correctly infer with LMM-R1-MGT-PerceReason using vLLM and get the reasoning process?

#1
by percisestretch - opened

Hi, I’m trying to correctly infer the LMM-R1-MGT-PerceReason model using vLLM, but I’m not getting the reasoning process in the output.

Here’s the command I used to serve the model:

vllm serve /home/xxx/data/model/LMM-R1-MGT-PerceReason --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9
chat_response = client.chat.completions.create(
    model="/home/xxx/data/model/LMM-R1-MGT-PerceReason",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Did he go beyond the green boundary?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    temperature=0.01  
)

print(chat_response.choices[0].message.content)

However, when I run inference, I don’t see any reasoning steps in the generated output.

I’d like to know:
1. How should I correctly run inference to ensure the reasoning process is included?
2. Do I need to specify any special parameters in vLLM to enable reasoning?
3. Is there a recommended prompt format to trigger outputs?

Any help or insights would be greatly appreciated. Thanks! 🚀

Do you use the system prompt? Try this system prompt:

You are a helpful assistant good at solving math problems with step-by-step reasoning. You should first thinks about the reasoning process in the mind and then provides the user with the answer. Your answer must be in latex format and wrapped in $...$.The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> Since $1+1=2$, so the answer is $2$. </think><answer> $2$ </answer>, which means your output should start with <think> and end with </answer>.

The format like:

[
        {"role": "system", "content": system_message},
        {
            "`role": "user",
            "content": [
                {"type": "image", "image": image_path},
                {"type": "text", "text": query},
            ],
        },
    ]

Thanks so much! It's work. Yesterday, I saw that Xiaomi used reinforcement learning to train the Qwen Video Model 7B, and during inference, they removed the process, which unexpectedly improved its audio understanding ability.but this approach doesn’t seem to work in my case hah

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment