Trying to make a DWQ version

#3
by atkr - opened

Hi Goekdeniz-Guelmez,

Thanks for your work on the Josiefied/abliterated qwen3 models, much appreciated!

I've been trying to make a DWQ version of it as follows: (with the goal of having great quality AND ~50% faster tokens per second)

mlx_lm.dwq --model mlx-community/Josiefied-Qwen3-30B-A3B-abliterated-v2-8bit --quantized-model mlx-community/Josiefied-Qwen3-30B-A3B-abliterated-v2-4bit --mlx-path Josiefied-Qwen3-30B-A3B-abliterated-v2-DWQ-4bit --max-seq-length 512 --batch-size 1 --learning-rate 1e-5 --num-samples 4096

The resulting model/quant is functional and usable, but it easily gets lost into endless repetition loops when asked anything relatively complex. Context size is not the issue as I have it maxed out and it doesn't get anywhere close to full in my tests. (e.g.: ask it to write a 10,000 words story, it just never stops and creates a never ending story with tons of duplicate chapters and content.)

Would you happen to have any tips and tricks to share that could help with this situation? I'm also interested in making a higher than 4bit DWQ quant, but am having a hard time finding relevant documentation or examples.

Thanks!

For what it's worth, I've tried a few other Qwen3-30B-A3B DWQ-4bit models I found on hugging face and they seemingly all have the same issue.
This prompt just goes on forever and repeats whole sentences, paragraphs and even chapter to the never ending story, but works "fine"(it ends before 10,000 words, actually, but at least doesn't go out of control) using the 8bit mlx quant.
"""
write a 10000 word story and provide the whole story in your response. There is no character limit in your response.
"""
(this is with no system prompt and 40960 context, temp 0.6, top k 20, repeat penalty 1.1, min p 0, top p 0.95)

Hey,

Thanks for trying it out, the reason is the training they went through in general, due to the inherit nature of reinforcement learning (rl) training these model perform very good in logical, math, coding tasks = reasoning, but perform bad in creative writing or story telling etc. they tent to overthink since thats not there strong suite. Abliterating it on top significantly reduces quality output, also since this is a bigger model I couldn't train it as good as the 8B. I suggest to turn off the reasoning via /no_think and guiding it with a system prompt.

A DWQ for the 8B and below are coming though :D

Sign up or log in to comment