Using vllm to infer 'Llama3-ChatQA-1.5-8B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds <|im_end|> to the tokenizer, and you need to add mapping to generation_config.json.
@zjyhf To be clear, are you saying this model has incorrect mapping of tokenid 128010 to string value of "<|reserved_special_token_5|>"? If there are no incorrect mapping, then using vllm "stop" param to pass extra tokens you want to use as stop tokens in addition to EOS.