generation_config.json adds a mapping with the special token '<|im_end|>' to solve the problem of non-stop generation when <|im_end|> is encountered.
#13
by
zjyhf
- opened
Using vllm to infer 'Llama3-ChatQA-1.5-70B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds a mapping to '<|im_end|>' in the tokenizer.
At the same time, '<|im_end|>' needs to be configured in the tokenizer: https://huggingface.co/nvidia/Llama3-ChatQA-1.5-70B/discussions/12