weighted/imatrix or static quants for non-english languages?

by inputout - opened 28 days ago

28 days ago

Hello king of ggufs, according to what I read: weighted/imatrix quants usually offer a significantly better model quality than classical static quants at the same quantization level and are therefore almost always recommended.
BUT does this also apply to non-English chat? I have heard weighted/imatrix quants can affect multilingualism.Or does it not matter anyway because the models always think in english internally and the input and output language therefore do not play a role?
Should it be done like this?:

english chat -> weighted/imatrix
non-english chat -> static quants

Or is weighted/matrix ALWAYS better?

nicoboss

28 days ago

•

edited 28 days ago

In one study weighted quants showed much better quality even when during importance matrix training an English dataset is used. It even indicated that an English imatrix dataset showed even better results than if the same language is used as during inference but there was a too small statistical significance to say for certain. I think he reason an English imatrix dataset is superior even for those use cases is because most base models are primary trained on English.

nicoboss

28 days ago

•

edited 28 days ago

I even managed to find the paper again! It is titeled "English K_Quantization of LLMs Does Not Disproportionately Diminish Multilingual Performance" and available under https://arxiv.org/abs/2503.03592

Quote from conclusion:

Further, the usage of importance matrices written in non-English does not significantly improve performance on non-English datasets and might in fact slightly harm it. However, this reduction in performance is not statistically significant.

inputout

27 days ago

In one study weighted quants showed much better quality even when during importance matrix training an English dataset is used. It even indicated that an English imatrix dataset showed even better results than if the same language is used as during inference but there was a too small statistical significance to say for certain. I think he reason an English imatrix dataset is superior even for those use cases is because most base models are primary trained on English.

Thanks, that's interesting and could lead to the general rule (for common applications with common models) “always Imatrix”.
Perhaps it would also be possible to test imatrix vs static for a target language, by determining the perplexity using input text with the content of the non-English target language.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment