ubergarm
/

Qwen3-235B-A22B-Instruct-2507-GGUF

Text Generation

Model card Files Files and versions Community

ubergarm commited on Jul 27

Commit

f513ad6

·

1 Parent(s): 8542042

Add IQ4_KSS and IQ3_KS

Files changed (1) hide show

README.md +99 -0

README.md CHANGED Viewed

@@ -172,6 +172,55 @@ numactl -N 1 -m 1 \
 </details>
 ## `IQ3_K` 106.644 GiB (3.897 BPW)
 Final estimate: PPL = 4.4561 +/- 0.02657
@@ -217,6 +266,56 @@ numactl -N 1 -m 1 \
 </details>
 ## `IQ2_KL` 81.866 GiB (2.991 BPW)
 Final estimate: PPL = 4.7912 +/- 0.02910

 </details>
+## `IQ4_KSS` 115.085 GiB (4.205 BPW)
+Final estimate: PPL = 4.4017 +/- 0.02614
+<details>
+This one is a little funky just for fun. Seems smort!
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+# Repeating Layers [0-93]
+custom="
+# Attention
+blk\..*\.attn_q.*=iq6_k
+blk\..*\.attn_k.*=q8_0
+blk\..*\.attn_v.*=q8_0
+blk\..*\.attn_output.*=iq6_k
+# Routed Experts
+blk\.(0|1|2|3)\.ffn_down_exps\.weight=iq5_ks
+blk\.(0|1|2|3)\.ffn_(gate|up)_exps\.weight=iq4_ks
+blk\..*\.ffn_down_exps\.weight=iq4_ks
+blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
+# Token Embedding
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 0 -m 0 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/imatrix-Qwen3-235B-A22B-Instruct-2507-BF16.dat \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-BF16-00001-of-00010.gguf \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-IQ4_KSS.gguf \
+    IQ4_KSS \
+    192
+```
+</details>
 ## `IQ3_K` 106.644 GiB (3.897 BPW)
 Final estimate: PPL = 4.4561 +/- 0.02657
 </details>
+## `IQ3_KS` 101.308 GiB (3.702 BPW)
+Final estimate: PPL = 4.4915 +/- 0.02685
+<details>
+Another funky smort one!
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+# Repeating Layers [0-93]
+custom="
+# Attention
+blk\..*\.attn_q.*=iq6_k
+blk\..*\.attn_k.*=q8_0
+blk\..*\.attn_v.*=q8_0
+blk\..*\.attn_output.*=iq6_k
+# Routed Experts
+blk\.(0|1|2|3)\.ffn_down_exps\.weight=iq5_ks
+blk\.(0|1|2|3)\.ffn_(gate|up)_exps\.weight=iq4_ks
+blk\..*\.ffn_down_exps\.weight=iq4_ks
+blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks
+# Token Embedding
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 0 -m 0 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/imatrix-Qwen3-235B-A22B-Instruct-2507-BF16.dat \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-BF16-00001-of-00010.gguf \
+    /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-IQ3_KS.gguf \
+    IQ3_KS \
+    192
+```
+</details>
 ## `IQ2_KL` 81.866 GiB (2.991 BPW)
 Final estimate: PPL = 4.7912 +/- 0.02910