GLM 4.5
Will be queued as soon @mradermacher updates to the latest version of our llama.cpp fork which I just updated. I will already manually prepare the GGUFs in the meantime as they are so large that manual handling makes sense.
They are all queued and on their way! :D
Some GLM-4.5-Air quants are already uploaded.
Due to the massive size of some of those models it will take a few days for all quants to be done especially because GLM 4.5
and GLM 4.5-Base
with a size of 355B will require RPC imatrix computation as I'm a perfectionist and want to imatrix compute in full precision which alone will probably take around 12 hours per model.
You can check for progress at http://hf.tst.eu/status.html or regularly check the model summary page at the following locations for quants to appear:
@mradermacher If we don't want to skip low bit-per-wight quants for GLM 4.5 we need to set the following for low bits-per-wight quants according to https://github.com/ggml-org/llama.cpp/pull/14939#issuecomment-3153670235:
Just FYI for anyone wanting to create i-quants; as the final layer will not get imatrix data until MTP is supported it has to be overridden for lower quants to work, eg. using --tensor-type 46=iq4_xs or --tensor-type 92=iq4_xs.
We will have to requant them anyways once Multi-Token Prediction (MTP) is implemented. I would be fine with skipping them if you don't want to change our quantization standard for these models.
Skipping seems the right thing, indeed (or not providing any quants). It's nice to have a perspective that it will work some day.