Use unsloth BF16 GGUF to quantize IQ1_M/S. Blk.46 is not being used in llama.cpp therefore the weights of blk.46 are quantized to TQ1_0 to have minimum memory allocation.
Added MXFP4 version:
- MXFP4: Embedding, Output are kept with Q6_K. The attn layers use IQ4_XS. All ffn expert layers including shared experts are quantized to SOTA MXFP4.
- MXFP4 Max: Embedding, Output and attn layers are kept with Q6_K. First layer uses full precision. The rest of ffn expert layers are quantized to SOTA MXFP4. The shared experts weights keep BF16.
- Downloads last month
- 2,792
Hardware compatibility
Log In
to view the estimation
1-bit
2-bit
3-bit
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for lovedheart/GLM-4.5-Air-GGUF-IQ1_M
Base model
zai-org/GLM-4.5-Air