ubergarm commited on
Commit
00a2fee
·
1 Parent(s): 86b5619

Add IQ4_K to fill the gap

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -80,6 +80,52 @@ numactl -N 0 -m 0 \
80
 
81
  </details>
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ## `pure-IQ4_KS` 116.994 GiB (4.275 BPW)
84
  Final estimate: PPL = 4.4156 +/- 0.02624
85
 
 
80
 
81
  </details>
82
 
83
+ ## `IQ4_K` 134.183 GiB (4.903 BPW)
84
+ Final estimate: PPL = 4.3668 +/- 0.02594
85
+
86
+ <details>
87
+
88
+ <summary>👈 Secret Recipe</summary>
89
+
90
+ ```bash
91
+ #!/usr/bin/env bash
92
+
93
+ # Repeating Layers [0-93]
94
+
95
+ custom="
96
+ # Attention
97
+ blk\..*\.attn_q.*=iq6_k
98
+ blk\..*\.attn_k.*=q8_0
99
+ blk\..*\.attn_v.*=q8_0
100
+ blk\..*\.attn_output.*=iq6_k
101
+
102
+ # Routed Experts
103
+ blk\..*\.ffn_down_exps\.weight=iq5_k
104
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq4_k
105
+
106
+ # Token Embedding
107
+ token_embd\.weight=iq6_k
108
+ output\.weight=iq6_k
109
+ "
110
+
111
+ custom=$(
112
+ echo "$custom" | grep -v '^#' | \
113
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
114
+ )
115
+
116
+ numactl -N 0 -m 0 \
117
+ ./build/bin/llama-quantize \
118
+ --custom-q "$custom" \
119
+ --imatrix /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/imatrix-Qwen3-235B-A22B-Instruct-2507-BF16.dat \
120
+ /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-BF16-00001-of-00010.gguf \
121
+ /mnt/raid/models/ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF/Qwen3-235B-A22B-Instruct-2507-IQ4_K.gguf \
122
+ IQ4_K \
123
+ 192
124
+ ```
125
+
126
+ </details>
127
+
128
+
129
  ## `pure-IQ4_KS` 116.994 GiB (4.275 BPW)
130
  Final estimate: PPL = 4.4156 +/- 0.02624
131