ubergarm commited on
Commit
14e974e
·
1 Parent(s): c738943

Add IQ1_KT (with iq4_nl ffn_down_exps lmao)

Browse files
Files changed (2) hide show
  1. README.md +60 -0
  2. images/perplexity.png +2 -2
README.md CHANGED
@@ -347,6 +347,66 @@ numactl -N 0 -m 0 \
347
 
348
  </details>
349
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
350
  ## Quick Start
351
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
352
 
 
347
 
348
  </details>
349
 
350
+ ## IQ1_KT 36.039 GiB (2.802 BPW)
351
+ Final estimate: PPL = 5.8214 +/- 0.03767
352
+
353
+ <details>
354
+
355
+ <summary>👈 Secret Recipe</summary>
356
+
357
+ ```bash
358
+ #!/usr/bin/env bash
359
+
360
+ custom="
361
+ # 47 Repeating Layers [0-46]
362
+ # Note: All ffn_down.* layers are not divisible by 256 so have limited quantization options.
363
+
364
+ # Attention
365
+ blk\..*\.attn_q.*=iq4_kt
366
+ blk\..*\.attn_k.*=iq4_kt
367
+ blk\..*\.attn_v.*=iq4_kt
368
+ blk\..*\.attn_output.*=iq4_kt
369
+
370
+ # First 1 Dense Layers [0]
371
+ blk\..*\.ffn_down\.weight=iq4_nl
372
+ blk\..*\.ffn_(gate|up)\.weight=iq4_kt
373
+
374
+ # Shared Expert Layers [1-46]
375
+ blk\..*\.ffn_down_shexp\.weight=iq4_nl
376
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_kt
377
+
378
+ # Routed Experts Layers [1-46]
379
+ blk\..*\.ffn_down_exps\.weight=iq4_nl
380
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq1_kt
381
+
382
+ # NextN MTP Layer [46]
383
+ blk\..*\.nextn\.embed_tokens\.weight=iq4_kt
384
+ blk\..*\.nextn\.shared_head_head\.weight=iq4_kt
385
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
386
+
387
+ # Non-Repeating Layers
388
+ token_embd\.weight=iq4_k
389
+ output\.weight=iq6_k
390
+ "
391
+
392
+ custom=$(
393
+ echo "$custom" | grep -v '^#' | \
394
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
395
+ )
396
+
397
+ numactl -N 1 -m 1 \
398
+ ./build/bin/llama-quantize \
399
+ --custom-q "$custom" \
400
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/imatrix-GLM-4.5-Air-BF16.dat \
401
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-128x9.4B-BF16-00001-of-00005.gguf \
402
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-IQ1_KT.gguf \
403
+ IQ1_KT \
404
+ 192
405
+ ```
406
+
407
+ </details>
408
+
409
+
410
  ## Quick Start
411
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
412
 
images/perplexity.png CHANGED

Git LFS Details

  • SHA256: 806f7a09cd166d33741bb7ba61cdd41f4bc709cc667f5c6025b7598a42df7585
  • Pointer size: 131 Bytes
  • Size of remote file: 133 kB

Git LFS Details

  • SHA256: 1137af525d3f59db2e6948c6512a4ec741ae7127d1dda39c095c8ca95b61e369
  • Pointer size: 131 Bytes
  • Size of remote file: 140 kB