Imatrix file

by notafraud - opened Mar 18

Mar 18

I feel stupid for asking, but only now I've noticed that all your quants are imatrix. Do I need to download the imatrix file itself too? I'm asking because I remember having weird output on other Mistral models with mradermacher's imatrix quants (but not static ones, they worked fine).

bartowski

Owner Mar 18

no you do not :) they alter the quantization process while being created

I only provide the file for reference and repeatability

if you notice degraded performance do share, because it's valuable information since imatrix should explicitly improve quality

notafraud

Mar 19

•

edited Mar 19

There's something wrong with this model (probably on Mistral Ai's side, because it's the same on your Q5_K_L quant and on Q5_K_S from here): regenerating messages goes a bit wrong and starts as if in the middle of a sentence. It is only noticeable if you do it in llama.cpp (not server, cli version, i.e. through kv cache), but it definitely affects overall quality.

Rolled back to Mistral-Small-24B-Instruct-2501-Q4_K_L (also your quantization from earlier), and the issue doesn't present, all fine. Tested with the same system instructs and the same prompts. huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-Q5_K_L worked fine too.

Seems like Mistral Ai messed up something in configs.

JohanAR

Mar 21

if you notice degraded performance do share, because it's valuable information since imatrix should explicitly improve quality

I've been trying to find objective dis/advantages with imatrix quants, but haven't found anything definite. Do they always improve quality? I saw someone saying they didn't use it because it depends on the data set used during quantization, and they claimed quality would be worse if it was different from your own usage.

bartowski

Owner Mar 21

•

edited Mar 21

@JohanAR this has been mentioned a lot in the past but i'm fairly confident it's just misinformation being spread from early thinkings

there's a decent amount of conclusive data that, for example, the language of the imatrix dataset does not correlate to increased performance in that language. If the language of the data is not important, then it's extremely unlikely anything else is, since that's going to be the largest change possible, the characters and words themselves are so different

this in particular was a very valuable study, which found that across English, Norwegian, and Malayalam datasets, there was no correlation between language chosen and strength of the model in that language. In fact they concluded that more often the different language reduced performance:

https://www.reddit.com/r/LocalLLaMA/comments/1j9ih6e/english_k_quantization_of_llms_does_not/

I think it's extremely likely the most important factor of imatrix is diversity rather than focus. If you have sufficiently noisy data, it should do a reasonable job of exercising most of the weights and getting a decent conclusion on which are important

The one caveat to this, is I do believe it's possible that different lengths of imatrix chunking could be relevant, but until we have an improved way to perform this (compilade is actively working on this) it's too difficult to provide conclusive evidence

JohanAR

Mar 22

Thanks @bartowski !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment