ubergarm
/

Qwen3-235B-A22B-Instruct-2507-GGUF

Text Generation

Model card Files Files and versions Community

ubergarm commited on 2 days ago

Commit

00a2fee

·

1 Parent(s): 86b5619

Add IQ4_K to fill the gap

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -80,6 +80,52 @@ numactl -N 0 -m 0 \
 </details>
 ## `pure-IQ4_KS` 116.994 GiB (4.275 BPW)
 Final estimate: PPL = 4.4156 +/- 0.02624

 </details>
+## `IQ4_K` 134.183 GiB (4.903 BPW)
+Final estimate: PPL = 4.3668 +/- 0.02594
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+# Repeating Layers [0-93]
+custom="
+# Attention
+blk\..*\.attn_q.*=iq6_k
+blk\..*\.attn_k.*=q8_0
+blk\..*\.attn_v.*=q8_0
+blk\..*\.attn_output.*=iq6_k
+# Routed Experts
+blk\..*\.ffn_down_exps\.weight=iq5_k
+blk\..*\.ffn_(gate|up)_exps\.weight=iq4_k
+# Token Embedding
+token_embd\.weight=iq6_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 0 -m 0 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/imatrix-Qwen3-235B-A22B-Instruct-2507-BF16.dat \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-BF16-00001-of-00010.gguf \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-IQ4_K.gguf \
+    IQ4_K \
+    192
+```
+</details>
 ## `pure-IQ4_KS` 116.994 GiB (4.275 BPW)
 Final estimate: PPL = 4.4156 +/- 0.02624