generation_config.json adds a mapping with the special token '<|im_end|>' to solve the problem of non-stop generation when <|im_end|> is encountered.
Browse filesUsing vllm to infer 'Llama3-ChatQA-1.5-70B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds a mapping to '<|im_end|>' in the tokenizer.
At the same time, '<|im_end|>' needs to be configured in the tokenizer: https://huggingface.co/nvidia/Llama3-ChatQA-1.5-70B/discussions/12
![8e4f01f676a0de25c1412b10172cfa9.png](https://cdn-uploads.huggingface.co/production/uploads/66161a077b605932bfbc106b/w1TlixMzjRbaK2btJvUwX.png)
- generation_config.json +2 -1
generation_config.json
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 128000,
|
4 |
-
"eos_token_id": [128001, 128009],
|
5 |
"transformers_version": "4.40.0.dev0"
|
6 |
}
|
|
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 128000,
|
4 |
+
"eos_token_id": [128001, 128009, 128010],
|
5 |
"transformers_version": "4.40.0.dev0"
|
6 |
}
|
7 |
+
|