性能不稳定.
#6
by
william0014
- opened
我使用相同的prompt, temperature 设置为0, 为什么模型回复都不一样? 有时候连内容含义都很较大? 需要怎么设置才能稳定? 推理框架使用的是VLLM, 我使用LLama3.1- chinese 8B 试过, 回复非常稳定.
Hi, please refer to vllm's documentation on this matter: https://docs.vllm.ai/en/stable/serving/faq.html
In addition to that, IIRC, the GPTQ kernel implementation in vllm is not deterministic which can also contributes to output variations.
jklj077
changed discussion status to
closed