Outputs senseless texts as opposed to non-QAT 27b abliterated Q6_K_M
Tried a few story generations. The texts seem to be senseless with wrong words, adjectives, nouns, etc. The settings are the same for all versions from official recommendations. LM Studio.
Which version did you use with gemma-3-27b-it-qat-abliterated? Was it Q6_K_M too?
gemma-3-27b-it-qat-abliterated Q4_K_M 16.55 GB
temperature: 1
top_k: 64
top_p: 0.95
min_p: 0.05
LM Studio 0.3.16.
(Anyway, even non-QAT Q6_K_M version in Russian begins outputting <unused11><unused28><mask><unused14><unused41><unused43> etc tokens after 1-2 messages.
Sometimes regenerations of reply works and unused tokens disappear.)
P.S. I was suprised to see Q6 and Q8 QAT versions in the list, while original Gemma 3 QAT version was Q4 only.
An example of generated text (translated to English):
Here is an erotic story that has all the above details:
Beginning of the story
On gold and luxury - everything was here, of course. They were, and they were! In them there were no coals and no gold, but they could not die, that's the main thing... Here only a few people stood outside, so that they could see and feel everything.
At this time the morning began, when the first rays of the sun began to fall on the earth.
Of course, it was night here!
Everything was so beautiful, how could you not say everything!
There were many excellent words here, but they were only so that everyone could see and feel everything! And all this time there was one person here: here for us!
That's all, and only because everyone had their own, right? Still. Here and can not be refound, here and there was not!
Here were, of course, did not know what was here. And here were also, if you look at the sky - this place knew what it was...
They were waiting for everything! Everyone saw! Everyone knew!
It all started a long time ago! That's all done!
And all that, and it was and was, that's who these yellow, distant ones are!
That's all. And that's all. And it all started from this place, there were people here!
That's all! It all started!
It's all over!
And it all started!
Everything was as if only for us, for us, and also for us!
All your parameters look reasonable. I'll try to make another version
Thanks, will be glad to test an updated model.
Same issue here, it breaks down into the same sort of patterns after only a few sentences
The QAT variant is Quantization Aware Training and in this case, it's 5,000 steps of post training of the IT (Instruction Tuned) Gemma models while simulating low-precision arithmetic which theoretically makes it robust to weight quantization. According to DeepMind, each model is fine-tuned with QAT “using probabilities from the non-quantized checkpoint as targets,” yielding checkpoints ready for int4/int8 or FP8 inference.
To me, while an interesting research direction, it feels like too many cooks in the kitchen instead of a 'let me help you help me' situation because we already have context aware quantization and the entire point of quantization is preserving the intended model with minimal loss. Say QAT explodes in use as a post training standard; great, but now quantization technology may stagnate because something else intends to pick up the slack by being robust against quantization error. Or worse, future quant methods with a unique approach aren't effective on QAT models and make inference a nonsensical garbled mess, discouraging further quantization technology efforts. It's like the NCIS episode where two people use the same keyboard to try to stop a hack. Let quantization cook, QAT post training may reinforce certain behaviors and pitfalls to hold stronger and needless fine-tuning effectively only results in more catastrophic forgetting.That way experiments like Abliteration don't get skewed because the model target is QAT; aka its intended only proceeding modification is tailored to compression loss resistance.
I am happy to see Abliteration tested on it however. I could be completely wrong about my concerns with QAT, after all Deep Learning AI is in perpetuity a research project.