Repetion after 22k token mark.
I get this issue in both the normal Qwen3-30B-A3B model and your MAX-Q5 finetune, after the 22K token limit the models just repeat "linglingling" forever. I have only tested this in koboldcpp (not sure about vllm, llamacpp, lmstudio, etc.)
Also, I am using completion tasks, not instruction/chat.
Edit: Llama.cpp does not have this issue.
Hmm ; might need instruction / Chatml template.
This model is very touchy about which template you use.
Also; there have been recent MOE specific updates at LLamacpp in the past few days.
Other fixes:
-> Up rep pen
-> Increase rep pen range -> 128 to 256 ...
-> Activate DRY.
I may have spoke too soon. Llamacpp also has the issue.
I upped the report range to 32k and the penalty to 1.7. Same issue.
I am implementing a rolling summary to lower the impact, but I am seeing the same issue with different models. It mght be my setup.