Repetion after 22k token mark.

by lazyDataScientist - opened May 11

May 11

•

I get this issue in both the normal Qwen3-30B-A3B model and your MAX-Q5 finetune, after the 22K token limit the models just repeat "linglingling" forever. I have only tested this in koboldcpp (not sure about vllm, llamacpp, lmstudio, etc.)

Also, I am using completion tasks, not instruction/chat.

Edit: Llama.cpp does not have this issue.

DavidAU

Owner May 11

Hmm ; might need instruction / Chatml template.
This model is very touchy about which template you use.

Also; there have been recent MOE specific updates at LLamacpp in the past few days.

Other fixes:
-> Up rep pen
-> Increase rep pen range -> 128 to 256 ...
-> Activate DRY.

lazyDataScientist

May 12

I may have spoke too soon. Llamacpp also has the issue.

I upped the report range to 32k and the penalty to 1.7. Same issue.

I am implementing a rolling summary to lower the impact, but I am seeing the same issue with different models. It mght be my setup.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment