Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
EXL3 quants of [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)
|
6 |
+
|
7 |
+
[2.25 bits per weight](https://huggingface.co/turboderp/Qwen3-30B-A3B-exl3/tree/2.25bpw)
|
8 |
+
[3.00 bits per weight](https://huggingface.co/turboderp/Qwen3-30B-A3B-exl3/tree/3.0bpw)
|
9 |
+
[4.00 bits per weight](https://huggingface.co/turboderp/Qwen3-30B-A3B-exl3/tree/4.0bpw)
|
10 |
+
[5.00 bits per weight](https://huggingface.co/turboderp/Qwen3-30B-A3B-exl3/tree/5.0bpw)
|
11 |
+
[6.00 bits per weight](https://huggingface.co/turboderp/Qwen3-30B-A3B-exl3/tree/6.0bpw)
|
12 |
+
|
13 |
+
While I work out a way to meaningfully measure perplexity for such a sparse model, here are some other tests:
|
14 |
+
|
15 |
+
| Model | HumanEval pass@1 | KL-div vs FP16 (wiki2 20k tokens) | Top-1 agreement vs FP16 |
|
16 |
+
|----------|------------------|-----------------------------------|-------------------------|
|
17 |
+
| 2.25 bpw | 88.41% | 0.1416 | 84.78% |
|
18 |
+
| 3.00 bpw | 89.63% | 0.0688 | 89.44% |
|
19 |
+
| 4.00 bpw | 92.07% | 0.0215 | 94.33% |
|
20 |
+
| 5.00 bpw | 93.29% | 0.0094 | 96.24% |
|
21 |
+
| 6.00 bpw | 92.68% | 0.0054 | 97.45% |
|
22 |
+
| FP16 | 91.46% | - | - |
|