lovedheart/GLM-4.5-Air-GGUF-IQ1_M

Use unsloth BF16 GGUF to quantize IQ1_M/S. Blk.46 is not being used in llama.cpp therefore the weights of blk.46 are quantized to TQ1_0 to have minimum memory allocation.

Added MXFP4 version:

MXFP4: Embedding, Output are kept with Q6_K. The attn layers use IQ4_XS. All ffn expert layers including shared experts are quantized to SOTA MXFP4.
MXFP4 Max: Embedding, Output and attn layers are kept with Q6_K. First layer uses full precision. The rest of ffn expert layers are quantized to SOTA MXFP4. The shared experts weights keep BF16.

lovedheart
/

GLM-4.5-Air-GGUF-IQ1_M

Model tree for lovedheart/GLM-4.5-Air-GGUF-IQ1_M