Add IQ2_KS
Browse files
README.md
CHANGED
@@ -19,8 +19,12 @@ tags:
|
|
19 |
- [x] cast fp8 safetensors to bf16 safetensors
|
20 |
- [x] convert to bf16 GGUF
|
21 |
- [x] quantize Q8_0 without imatrix
|
22 |
-
- [
|
23 |
-
- [
|
|
|
|
|
|
|
|
|
24 |
|
25 |
Open a discussion if you have a specific target RAM+VRAM in mind for your rig and I'll see what I can do given the available quants. Cheers!
|
26 |
|
@@ -85,15 +89,57 @@ echo TODO
|
|
85 |
|
86 |
</details>
|
87 |
|
88 |
-
### `IQ2_KS`
|
89 |
-
Final estimate: PPL =
|
90 |
|
91 |
<details>
|
92 |
|
93 |
<summary>👈 Secret Recipe</summary>
|
94 |
|
95 |
```bash
|
96 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
```
|
98 |
|
99 |
</details>
|
|
|
19 |
- [x] cast fp8 safetensors to bf16 safetensors
|
20 |
- [x] convert to bf16 GGUF
|
21 |
- [x] quantize Q8_0 without imatrix
|
22 |
+
- [x] calculate and upload imatrix from Q8_0 (note imatrix is missing data for a few tensors: https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/discussions/1#68bc58de31fa67452e075b9f )
|
23 |
+
- [x] begin quantizing and releasing
|
24 |
+
- [x] IQ2_KS
|
25 |
+
- [ ] smol-IQ4_KSS
|
26 |
+
- [ ] smol-IQ2_KL
|
27 |
+
- [ ] etc...
|
28 |
|
29 |
Open a discussion if you have a specific target RAM+VRAM in mind for your rig and I'll see what I can do given the available quants. Cheers!
|
30 |
|
|
|
89 |
|
90 |
</details>
|
91 |
|
92 |
+
### `IQ2_KS` 289.820 GiB (2.425 BPW)
|
93 |
+
Final estimate: PPL = 3.2478 +/- 0.01721
|
94 |
|
95 |
<details>
|
96 |
|
97 |
<summary>👈 Secret Recipe</summary>
|
98 |
|
99 |
```bash
|
100 |
+
#!/usr/bin/env bash
|
101 |
+
|
102 |
+
custom="
|
103 |
+
## Attention [0-60] (GPU)
|
104 |
+
blk\..*\.attn_k_b\.weight=q8_0
|
105 |
+
blk\..*\.attn_v_b\.weight=q8_0
|
106 |
+
|
107 |
+
# Balance of attn tensors
|
108 |
+
blk\..*\.attn_kv_a_mqa\.weight=q8_0
|
109 |
+
blk\..*\.attn_q_a\.weight=q8_0
|
110 |
+
blk\..*\.attn_q_b\.weight=q8_0
|
111 |
+
blk\..*\.attn_output\.weight=q8_0
|
112 |
+
|
113 |
+
## First Single Dense Layer [0] (GPU)
|
114 |
+
blk\..*\.ffn_down\.weight=q8_0
|
115 |
+
blk\..*\.ffn_(gate|up)\.weight=q8_0
|
116 |
+
|
117 |
+
## Shared Expert [1-60] (GPU)
|
118 |
+
blk\..*\.ffn_down_shexp\.weight=q8_0
|
119 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
|
120 |
+
|
121 |
+
## Routed Experts [1-60] (CPU)
|
122 |
+
blk\..*\.ffn_down_exps\.weight=iq2_kl
|
123 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
|
124 |
+
|
125 |
+
## Token embedding and output tensors (GPU)
|
126 |
+
token_embd\.weight=iq4_k
|
127 |
+
output\.weight=iq6_k
|
128 |
+
"
|
129 |
+
|
130 |
+
custom=$(
|
131 |
+
echo "$custom" | grep -v '^#' | \
|
132 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
133 |
+
)
|
134 |
+
|
135 |
+
numactl -N 1 -m 1 \
|
136 |
+
./build/bin/llama-quantize \
|
137 |
+
--custom-q "$custom" \
|
138 |
+
--imatrix /mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/imatrix-Kimi-K2-Instruct-0905-Q8_0.dat \
|
139 |
+
/mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-384x14B-Instruct-safetensors-0905-BF16-00001-of-00046.gguf \
|
140 |
+
/mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-Instruct-0905-IQ2_KS.gguf \
|
141 |
+
IQ2_KS \
|
142 |
+
192
|
143 |
```
|
144 |
|
145 |
</details>
|