weighted/imatrix or static quants for non-english languages?
Hello king of ggufs, according to what I read: weighted/imatrix quants usually offer a significantly better model quality than classical static quants at the same quantization level and are therefore almost always recommended.
BUT does this also apply to non-English chat? I have heard weighted/imatrix quants can affect multilingualism.Or does it not matter anyway because the models always think in english internally and the input and output language therefore do not play a role?
Should it be done like this?:
- english chat -> weighted/imatrix
- non-english chat -> static quants
Or is weighted/matrix ALWAYS better?
In one study weighted quants showed much better quality even when during importance matrix training an English dataset is used. It even indicated that an English imatrix dataset showed even better results than if the same language is used as during inference but there was a too small statistical significance to say for certain. I think he reason an English imatrix dataset is superior even for those use cases is because most base models are primary trained on English.
I even managed to find the paper again! It is titeled "English K_Quantization of LLMs Does Not Disproportionately Diminish Multilingual Performance" and available under https://arxiv.org/abs/2503.03592
Quote from conclusion:
Further, the usage of importance matrices written in non-English does not significantly improve performance on non-English datasets and might in fact slightly harm it. However, this reduction in performance is not statistically significant.
In one study weighted quants showed much better quality even when during importance matrix training an English dataset is used. It even indicated that an English imatrix dataset showed even better results than if the same language is used as during inference but there was a too small statistical significance to say for certain. I think he reason an English imatrix dataset is superior even for those use cases is because most base models are primary trained on English.
Thanks, that's interesting and could lead to the general rule (for common applications with common models) “always Imatrix”.
Perhaps it would also be possible to test imatrix vs static for a target language, by determining the perplexity using input text with the content of the non-English target language.