Imatrix file
I feel stupid for asking, but only now I've noticed that all your quants are imatrix. Do I need to download the imatrix file itself too? I'm asking because I remember having weird output on other Mistral models with mradermacher's imatrix quants (but not static ones, they worked fine).
no you do not :) they alter the quantization process while being created
I only provide the file for reference and repeatability
if you notice degraded performance do share, because it's valuable information since imatrix should explicitly improve quality
There's something wrong with this model (probably on Mistral Ai's side, because it's the same on your Q5_K_L quant and on Q5_K_S from here): regenerating messages goes a bit wrong and starts as if in the middle of a sentence. It is only noticeable if you do it in llama.cpp (not server, cli version, i.e. through kv cache), but it definitely affects overall quality.
Rolled back to Mistral-Small-24B-Instruct-2501-Q4_K_L (also your quantization from earlier), and the issue doesn't present, all fine. Tested with the same system instructs and the same prompts. huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-Q5_K_L worked fine too.
Seems like Mistral Ai messed up something in configs.
if you notice degraded performance do share, because it's valuable information since imatrix should explicitly improve quality
I've been trying to find objective dis/advantages with imatrix quants, but haven't found anything definite. Do they always improve quality? I saw someone saying they didn't use it because it depends on the data set used during quantization, and they claimed quality would be worse if it was different from your own usage.
@JohanAR this has been mentioned a lot in the past but i'm fairly confident it's just misinformation being spread from early thinkings
there's a decent amount of conclusive data that, for example, the language of the imatrix dataset does not correlate to increased performance in that language. If the language of the data is not important, then it's extremely unlikely anything else is, since that's going to be the largest change possible, the characters and words themselves are so different
this in particular was a very valuable study, which found that across English, Norwegian, and Malayalam datasets, there was no correlation between language chosen and strength of the model in that language. In fact they concluded that more often the different language reduced performance:
https://www.reddit.com/r/LocalLLaMA/comments/1j9ih6e/english_k_quantization_of_llms_does_not/
I think it's extremely likely the most important factor of imatrix is diversity rather than focus. If you have sufficiently noisy data, it should do a reasonable job of exercising most of the weights and getting a decent conclusion on which are important
The one caveat to this, is I do believe it's possible that different lengths of imatrix chunking could be relevant, but until we have an improved way to perform this (compilade is actively working on this) it's too difficult to provide conclusive evidence
Thanks @bartowski !