IQ4: Difference between Plus, Plus2 and Super

#3
by UsernamePartialName - opened

Could you please elaborate on practical differences between these quants? From the Readme:

Max Plus: Imatrixed NEO Quant with minor adjustments + larger Output tensor/embed.
Max Plus 2: Imatrixed NEO Quant with minor adjustments + 16 bit output tensor.
Max Super: Imatrixed NEO Quant with Q6 adjustments + embed at Q6 + 16 bit output tensor.

Owner

The Output tensor has the largest effect there -> more bits, better, stronger performance - reasoning and general operation.
This covers the first 2.

"Super" means specific tensors (in this case experts) have more bits, which raises performance again - across the board.

High precision means: stronger, more nuanced reasoning ; for output (post reasoning block(s)) this means the general overall output will be stronger
relative to quants with "fewer" bits.

You will notice the "extra bits" on longer prompts, longer output generation and more complex prompts.

On the creative use case side: More details, better prose, sentence structure and so on.

Sign up or log in to comment