Repetion after 22k token mark.

#2
by lazyDataScientist - opened

I get this issue in both the normal Qwen3-30B-A3B model and your MAX-Q5 finetune, after the 22K token limit the models just repeat "linglingling" forever. I have only tested this in koboldcpp (not sure about vllm, llamacpp, lmstudio, etc.)

Also, I am using completion tasks, not instruction/chat.

Edit: Llama.cpp does not have this issue.

Hmm ; might need instruction / Chatml template.
This model is very touchy about which template you use.

Also; there have been recent MOE specific updates at LLamacpp in the past few days.

Other fixes:
-> Up rep pen
-> Increase rep pen range -> 128 to 256 ...
-> Activate DRY.

I may have spoke too soon. Llamacpp also has the issue.

I upped the report range to 32k and the penalty to 1.7. Same issue.

I am implementing a rolling summary to lower the impact, but I am seeing the same issue with different models. It mght be my setup.

Sign up or log in to comment