Add IQ1_KT (with iq4_nl ffn_down_exps lmao)
Browse files- README.md +60 -0
- images/perplexity.png +2 -2
README.md
CHANGED
@@ -347,6 +347,66 @@ numactl -N 0 -m 0 \
|
|
347 |
|
348 |
</details>
|
349 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
350 |
## Quick Start
|
351 |
If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
|
352 |
|
|
|
347 |
|
348 |
</details>
|
349 |
|
350 |
+
## IQ1_KT 36.039 GiB (2.802 BPW)
|
351 |
+
Final estimate: PPL = 5.8214 +/- 0.03767
|
352 |
+
|
353 |
+
<details>
|
354 |
+
|
355 |
+
<summary>👈 Secret Recipe</summary>
|
356 |
+
|
357 |
+
```bash
|
358 |
+
#!/usr/bin/env bash
|
359 |
+
|
360 |
+
custom="
|
361 |
+
# 47 Repeating Layers [0-46]
|
362 |
+
# Note: All ffn_down.* layers are not divisible by 256 so have limited quantization options.
|
363 |
+
|
364 |
+
# Attention
|
365 |
+
blk\..*\.attn_q.*=iq4_kt
|
366 |
+
blk\..*\.attn_k.*=iq4_kt
|
367 |
+
blk\..*\.attn_v.*=iq4_kt
|
368 |
+
blk\..*\.attn_output.*=iq4_kt
|
369 |
+
|
370 |
+
# First 1 Dense Layers [0]
|
371 |
+
blk\..*\.ffn_down\.weight=iq4_nl
|
372 |
+
blk\..*\.ffn_(gate|up)\.weight=iq4_kt
|
373 |
+
|
374 |
+
# Shared Expert Layers [1-46]
|
375 |
+
blk\..*\.ffn_down_shexp\.weight=iq4_nl
|
376 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_kt
|
377 |
+
|
378 |
+
# Routed Experts Layers [1-46]
|
379 |
+
blk\..*\.ffn_down_exps\.weight=iq4_nl
|
380 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq1_kt
|
381 |
+
|
382 |
+
# NextN MTP Layer [46]
|
383 |
+
blk\..*\.nextn\.embed_tokens\.weight=iq4_kt
|
384 |
+
blk\..*\.nextn\.shared_head_head\.weight=iq4_kt
|
385 |
+
blk\..*\.nextn\.eh_proj\.weight=q8_0
|
386 |
+
|
387 |
+
# Non-Repeating Layers
|
388 |
+
token_embd\.weight=iq4_k
|
389 |
+
output\.weight=iq6_k
|
390 |
+
"
|
391 |
+
|
392 |
+
custom=$(
|
393 |
+
echo "$custom" | grep -v '^#' | \
|
394 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
395 |
+
)
|
396 |
+
|
397 |
+
numactl -N 1 -m 1 \
|
398 |
+
./build/bin/llama-quantize \
|
399 |
+
--custom-q "$custom" \
|
400 |
+
--imatrix /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/imatrix-GLM-4.5-Air-BF16.dat \
|
401 |
+
/mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-128x9.4B-BF16-00001-of-00005.gguf \
|
402 |
+
/mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-IQ1_KT.gguf \
|
403 |
+
IQ1_KT \
|
404 |
+
192
|
405 |
+
```
|
406 |
+
|
407 |
+
</details>
|
408 |
+
|
409 |
+
|
410 |
## Quick Start
|
411 |
If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
|
412 |
|
images/perplexity.png
CHANGED
![]() |
Git LFS Details
|
![]() |
Git LFS Details
|