[Bug]:Phi-4-Mini giving garbage outputs with torch 2.5.1 and vllm==0.7.3 with multiple parallel requests on Long context prompts #14058

#11
by raghavgg - opened

I am sorry to reopen this issue but i was not able to resolve my query mentioned here:https://github.com/vllm-project/vllm/issues/14058

i have added all the details to reproduce the issue in the comment of the PR.
Basically the model is behaving differently in sequential execution vs parallel execution!

@ykim362 @nguyenbh I would be thankful if any one could share some insight on where am i going wrong.
Thanks!

Hi @raghavgg
One potential workaround for your scenario is to copy the values in long_factor and paste them into short_factor in the config.json. (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/config.json#L86)
That makes long_factor and short_factor the same.
Can you try that?
Thanks.

Thanks! will give it a try and let you know

Sign up or log in to comment