ikawrakow commited on
Commit
da03cbf
·
verified ·
1 Parent(s): a600698

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -1,3 +1,17 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ ---
5
+
6
+ This repository contains 3 versions of Qwen3-30B-A3B quantized with `IQ4_KS`. The interesting part is that these models achieve a lower peplexity on
7
+ `wiki.test.raw` than the original `bf16` model. This is interesting, considering that no QAT has been mentioned
8
+ in the [Qwen3 announcement](https://qwenlm.github.io/blog/qwen3/). Hence I'm putting them out there for anyone interested in evaluating performance by means other than PPL,
9
+ or just using for local inferrence.
10
+ For more details see [this discussion](https://github.com/ikawrakow/ik_llama.cpp/discussions/359).
11
+
12
+ **Note**: These models will only work with [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) as the `IQ4_KS` quantization type is not available in mainline `llama.cpp`.
13
+
14
+ The only difference between the 3 models is the imatrix used:
15
+ * **Qwen3-30B-A3B-IQ4_KS-IK-Imatrix.gguf**: imatrix computed using 500,000 tokens from `wiki.train.raw`
16
+ * **Qwen3-30B-A3B-IQ4_KS-Bartowski-Imatrix.gguf**: Bartowski's [imatrix](https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF/blob/main/Qwen_Qwen3-30B-A3B.imatrix)
17
+ * **Qwen3-30B-A3B-IQ4_KS-Unsloth-Imatrix.gguf**: Unsloth [imatrix](https://huggingface.co/unsloth/Qwen3-30B-A3B-128K-GGUF/blob/main/imatrix_unsloth.dat)