microsoft/Phi-4-mini-instruct · [Bug]:Phi-4-Mini giving garbage outputs with torch 2.5.1 and vllm==0.7.3 with multiple parallel requests on Long context prompts #14058

Mar 3

I am sorry to reopen this issue but i was not able to resolve my query mentioned here:https://github.com/vllm-project/vllm/issues/14058

i have added all the details to reproduce the issue in the comment of the PR.
Basically the model is behaving differently in sequential execution vs parallel execution!

raghavgg

Mar 6

@ykim362 @nguyenbh I would be thankful if any one could share some insight on where am i going wrong.
Thanks!

ykim362

Microsoft org Mar 6

•

edited Mar 6

Hi @raghavgg
One potential workaround for your scenario is to copy the values in long_factor and paste them into short_factor in the config.json. (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/config.json#L86)
That makes long_factor and short_factor the same.
Can you try that?
Thanks.

raghavgg

Mar 7

Thanks! will give it a try and let you know

raghavgg

Mar 10

Hello this is working well for me!! thanks a lot

raghavgg changed discussion status to closed Mar 10

i9schen

Apr 8

•

edited Apr 8

Hi @raghavgg
One potential workaround for your scenario is to copy the values in long_factor and paste them into short_factor in the config.json. (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/config.json#L86)
That makes long_factor and short_factor the same.
Can you try that?
Thanks.

Hi @ykim362 , may I know if this is an officially suggested solution for running Phi-4-mini-instruct? Thanks a lot.
Will this modification be made on the latest version of the config file? Or this is just a patch for certain scenarios (specific hardware, etc.?)