https://huggingface.co/blog/autoround
https://huggingface.co/blog/autoround
AutoRound now supports all GGUF q*_k_s formats,shows clear advantage in most q4_k_s scenarios and up to 2.0x accuracy gain in q2_k_s
Interesting yes I see that https://github.com/intel/auto-round/blob/main/docs/gguf_accuracy.md shows they support some GGUF formats specifically like Q4_K_S which is available in mainline llama.cpp.
The quants I'm cooking are exclusive to ik_llama.cpp like IQ4_KS_R4 ... Look closely at the names and you will see they are not exactly the same and are different style.
Any idea what kind of hardware you'd need to use autoround with DeepSeek 671B ? They do supposedly support the architecture for GGUF, but I probably couldn't run it I'm guessing as I don't have access to H100s or any of that jazz.
Also I've not seen comparisons between autoround and ik's quants or exl3 or ParetoQ for that matter.
Interesting yes, but I'm not gonna mess with it probably. Thanks!