[Bug]:Phi-4-Mini giving garbage outputs with torch 2.5.1 and vllm==0.7.3 with multiple parallel requests on Long context prompts #14058
I am sorry to reopen this issue but i was not able to resolve my query mentioned here:https://github.com/vllm-project/vllm/issues/14058
i have added all the details to reproduce the issue in the comment of the PR.
Basically the model is behaving differently in sequential execution vs parallel execution!
Hi
@raghavgg
One potential workaround for your scenario is to copy the values in long_factor and paste them into short_factor in the config.json. (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/config.json#L86)
That makes long_factor and short_factor the same.
Can you try that?
Thanks.
Thanks! will give it a try and let you know
Hello this is working well for me!! thanks a lot
Hi @raghavgg
One potential workaround for your scenario is to copy the values in long_factor and paste them into short_factor in the config.json. (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/config.json#L86)
That makes long_factor and short_factor the same.
Can you try that?
Thanks.
Hi
@ykim362
, may I know if this is an officially suggested solution for running Phi-4-mini-instruct? Thanks a lot.
Will this modification be made on the latest version of the config file? Or this is just a patch for certain scenarios (specific hardware, etc.?)