ubergarm
/

Kimi-K2-Instruct-0905-GGUF

Text Generation

Model card Files Files and versions Community

ubergarm commited on 4 days ago

Commit

beec9f1

·

1 Parent(s): 0b92986

Add IQ2_KS

Files changed (1) hide show

README.md +51 -5

README.md CHANGED Viewed

@@ -19,8 +19,12 @@ tags:
 - [x] cast fp8 safetensors to bf16 safetensors
 - [x] convert to bf16 GGUF
 - [x] quantize Q8_0 without imatrix
-- [ ] calculate and upload imatrix from Q8_0
-- [ ] begin quantizing and releasing
 Open a discussion if you have a specific target RAM+VRAM in mind for your rig and I'll see what I can do given the available quants. Cheers!
@@ -85,15 +89,57 @@ echo TODO
 </details>
-### `IQ2_KS` TODO
-Final estimate: PPL = TODO
 <details>
 <summary>👈 Secret Recipe</summary>
 ```bash
-echo TODO
 ```
 </details>

 - [x] cast fp8 safetensors to bf16 safetensors
 - [x] convert to bf16 GGUF
 - [x] quantize Q8_0 without imatrix
+- [x] calculate and upload imatrix from Q8_0 (note imatrix is missing data for a few tensors: https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/discussions/1#68bc58de31fa67452e075b9f )
+- [x] begin quantizing and releasing
+- [x] IQ2_KS
+- [ ] smol-IQ4_KSS
+- [ ] smol-IQ2_KL
+- [ ] etc...
 Open a discussion if you have a specific target RAM+VRAM in mind for your rig and I'll see what I can do given the available quants. Cheers!
 </details>
+### `IQ2_KS` 289.820 GiB (2.425 BPW)
+Final estimate: PPL = 3.2478 +/- 0.01721
 <details>
 <summary>👈 Secret Recipe</summary>
 ```bash
+#!/usr/bin/env bash
+custom="
+## Attention [0-60] (GPU)
+blk\..*\.attn_k_b\.weight=q8_0
+blk\..*\.attn_v_b\.weight=q8_0
+# Balance of attn tensors
+blk\..*\.attn_kv_a_mqa\.weight=q8_0
+blk\..*\.attn_q_a\.weight=q8_0
+blk\..*\.attn_q_b\.weight=q8_0
+blk\..*\.attn_output\.weight=q8_0
+## First Single Dense Layer [0] (GPU)
+blk\..*\.ffn_down\.weight=q8_0
+blk\..*\.ffn_(gate|up)\.weight=q8_0
+## Shared Expert [1-60] (GPU)
+blk\..*\.ffn_down_shexp\.weight=q8_0
+blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
+## Routed Experts [1-60] (CPU)
+blk\..*\.ffn_down_exps\.weight=iq2_kl
+blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
+## Token embedding and output tensors (GPU)
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 1 -m 1 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/imatrix-Kimi-K2-Instruct-0905-Q8_0.dat \
+    /mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-384x14B-Instruct-safetensors-0905-BF16-00001-of-00046.gguf \
+    /mnt/data/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-Instruct-0905-IQ2_KS.gguf \
+    IQ2_KS \
+    192
 ```
 </details>