long repeatitions
model wont stop after an answer, repeats again and again, but can give very long output, I have to kill the process, to stop it
yeah, same problem - it keeps generating content continuously or sometimes stops after "The". The first message always seem to be ok.
Are you experiencing this with all quants? I will be regenerating them after this PR is merged: https://github.com/ggerganov/llama.cpp/pull/6920
Hi @nullt3r @subbur , can you please test with the 1048k context model? this was generated with the above PR merged so it should benefit from tokenization fixes as well as additional training: https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF
It works much better - its functional model now. Thanks.
Glad to hear it!
Hi @nullt3r @subbur , can you please test with the 1048k context model? this was generated with the above PR merged so it should benefit from tokenization fixes as well as additional training: https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF
For 1048k, the repeat issue is also there. Actually I go to 262k to find whether it is not an issue for this version.
I use Q8_0, and prompt "Introduce Kobe". It generate like 2700+ tokens and I stopped it manually. With a repeat_penalty, the issue will be better, but still have chance to output endlessly but actually not identical content.
And for long context prompt, it seems do not have such issue. So I take it as the defects for the model rather than quant.
@Starlento can you try setting the eos token to 128009 using the gguf-set-metadata.py script?
edit: I also just updated the models with the bpe tokenization fixes from llamacpp