ubergarm commited on
Commit
726f6c5
·
1 Parent(s): d968e06

Uploading IQ4_K and IQ5_KSS

Browse files
Files changed (2) hide show
  1. README.md +117 -0
  2. images/perplexity.png +2 -2
README.md CHANGED
@@ -100,6 +100,123 @@ numactl -N 0 -m 0 \
100
 
101
  </details>
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  ## IQ4_KSS 54.801 GiB (4.261 BPW)
105
  Final estimate: PPL = 4.7056 +/- 0.02909
 
100
 
101
  </details>
102
 
103
+ ## IQ5_KS 72.855 GiB (5.665 BPW)
104
+ Final estimate: PPL = 4.5948 +/- 0.02815
105
+
106
+ <details>
107
+
108
+ <summary>👈 Secret Recipe</summary>
109
+
110
+ ```bash
111
+ #!/usr/bin/env bash
112
+
113
+ custom="
114
+ # 47 Repeating Layers [0-46]
115
+ # Note: All ffn_down.* layers are not divisible by 256 so have limited quantization options.
116
+
117
+ # Attention
118
+ blk\..*\.attn_q.*=iq5_ks
119
+ blk\..*\.attn_k.*=q8_0
120
+ blk\..*\.attn_v.*=q8_0
121
+ blk\..*\.attn_output.*=iq5_ks
122
+
123
+ # First 1 Dense Layers [0]
124
+ blk\..*\.ffn_down\.weight=q6_0
125
+ blk\..*\.ffn_(gate|up)\.weight=iq5_ks
126
+
127
+ # Shared Expert Layers [1-46]
128
+ blk\..*\.ffn_down_shexp\.weight=q6_0
129
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
130
+
131
+ # Routed Experts Layers [1-46]
132
+ blk\..*\.ffn_down_exps\.weight=q6_0
133
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq5_ks
134
+
135
+ # NextN MTP Layer [46]
136
+ blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
137
+ blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
138
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
139
+
140
+ # Non-Repeating Layers
141
+ token_embd\.weight=iq4_k
142
+ output\.weight=iq6_k
143
+ "
144
+
145
+ custom=$(
146
+ echo "$custom" | grep -v '^#' | \
147
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
148
+ )
149
+
150
+ numactl -N 0 -m 0 \
151
+ ./build/bin/llama-quantize \
152
+ --custom-q "$custom" \
153
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/imatrix-GLM-4.5-Air-BF16.dat \
154
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-128x9.4B-BF16-00001-of-00005.gguf \
155
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-IQ5_KS.gguf \
156
+ IQ5_KS \
157
+ 192
158
+ ```
159
+
160
+ </details>
161
+
162
+ ## IQ4_K 62.910 GiB (4.892 BPW)
163
+ Final estimate: PPL = 4.6273 +/- 0.02839
164
+
165
+ <details>
166
+
167
+ <summary>👈 Secret Recipe</summary>
168
+
169
+ ```bash
170
+ #!/usr/bin/env bash
171
+
172
+ custom="
173
+ # 47 Repeating Layers [0-46]
174
+ # Note: All ffn_down.* layers are not divisible by 256 so have limited quantization options.
175
+
176
+ # Attention
177
+ blk\..*\.attn_q.*=iq5_ks
178
+ blk\..*\.attn_k.*=q8_0
179
+ blk\..*\.attn_v.*=q8_0
180
+ blk\..*\.attn_output.*=iq5_ks
181
+
182
+ # First 1 Dense Layers [0]
183
+ blk\..*\.ffn_down\.weight=q6_0
184
+ blk\..*\.ffn_(gate|up)\.weight=iq5_ks
185
+
186
+ # Shared Expert Layers [1-46]
187
+ blk\..*\.ffn_down_shexp\.weight=q6_0
188
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
189
+
190
+ # Routed Experts Layers [1-46]
191
+ blk\..*\.ffn_down_exps\.weight=q5_0
192
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq4_k
193
+
194
+ # NextN MTP Layer [46]
195
+ blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
196
+ blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
197
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
198
+
199
+ # Non-Repeating Layers
200
+ token_embd\.weight=iq4_k
201
+ output\.weight=iq6_k
202
+ "
203
+
204
+ custom=$(
205
+ echo "$custom" | grep -v '^#' | \
206
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
207
+ )
208
+
209
+ numactl -N 1 -m 1 \
210
+ ./build/bin/llama-quantize \
211
+ --custom-q "$custom" \
212
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/imatrix-GLM-4.5-Air-BF16.dat \
213
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-128x9.4B-BF16-00001-of-00005.gguf \
214
+ /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-IQ4_K.gguf \
215
+ IQ4_K \
216
+ 192
217
+ ```
218
+
219
+ </details>
220
 
221
  ## IQ4_KSS 54.801 GiB (4.261 BPW)
222
  Final estimate: PPL = 4.7056 +/- 0.02909
images/perplexity.png CHANGED

Git LFS Details

  • SHA256: e2a522e9a883b09d06ec7e4f54f54a80e6f04f9a1851c18b5f175e5eaeff279d
  • Pointer size: 131 Bytes
  • Size of remote file: 119 kB

Git LFS Details

  • SHA256: 806f7a09cd166d33741bb7ba61cdd41f4bc709cc667f5c6025b7598a42df7585
  • Pointer size: 131 Bytes
  • Size of remote file: 133 kB