性能不稳定.

by william0014 - opened Sep 7, 2024

Sep 7, 2024

我使用相同的prompt, temperature 设置为0, 为什么模型回复都不一样? 有时候连内容含义都很较大? 需要怎么设置才能稳定? 推理框架使用的是VLLM, 我使用LLama3.1- chinese 8B 试过, 回复非常稳定.

jklj077

Qwen org Sep 13, 2024

Hi, please refer to vllm's documentation on this matter: https://docs.vllm.ai/en/stable/serving/faq.html
In addition to that, IIRC, the GPTQ kernel implementation in vllm is not deterministic which can also contributes to output variations.

jklj077 changed discussion status to closed Sep 13, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment