quant quality table
Browse files
README.md
CHANGED
@@ -31,9 +31,27 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
|
|
31 |
| Q4_K_M | 227 |
|
32 |
| Q8_0 | 402 |
|
33 |
|
34 |
-
Graph by ikawrakow comparing some lower-quality quant PPL (lower is better):
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Making Quants
|
39 |
```
|
|
|
31 |
| Q4_K_M | 227 |
|
32 |
| Q8_0 | 402 |
|
33 |
|
|
|
34 |
|
35 |
+
## Quant Quality
|
36 |
+
All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
|
37 |
+
|
38 |
+
| Quant | Size(GB)| % Diff | Overall | Writing | Roleplay | Reasoning | Math | Coding | Extraction | STEM | Humanities |
|
39 |
+
|--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
|
40 |
+
| Full FP16 | 810 | | **9.13** | 9.25 | **9.55** | 8.15 | 8.90 | 9.10 | 9.65 | 9.10 | 9.35 |
|
41 |
+
| IQ3_M | 170 | -0.99 | 9.04 | 8.90 | 9.45 | 7.75 | 8.95 | 8.95 | 9.70 | **9.15** | 9.50 |
|
42 |
+
| Q4_K_M | 227 | -1.10 | 9.03 | **9.40** | 9.00 | 8.25 | 8.85 | **9.10** | 9.50 | 8.90 | 9.25 |
|
43 |
+
| Q8_0 | 405 | -1.20 | 9.02 | **9.40** | 9.05 | **8.30** | **9.20** | 8.70 | 9.50 | 8.45 | 9.55 |
|
44 |
+
| W8A8-INT8 | 405 | -1.42 | 9.00 | 9.20 | 9.35 | 7.80 | 8.75 | 9.00 | 9.80 | 8.65 | 9.45 |
|
45 |
+
| FP8-Dynamic | 405 | -3.29 | 8.83 | 8.70 | 9.20 | 7.85 | 8.80 | 8.65 | 9.30 | 8.80 | 9.35 |
|
46 |
+
| IQ3_XS | 155 | -3.50 | 8.81 | 8.70 | 9.05 | 7.70 | 8.60 | 8.95 | 9.35 | 8.70 | 9.45 |
|
47 |
+
| IQ4_XS | 202 | -3.61 | 8.80 | 8.85 | **9.55** | 6.90 | 8.35 | 8.60 | **9.90** | 8.65 | **9.60** |
|
48 |
+
| *70B FP16* | 140 | -7.89 | 8.41 | 7.95 | 9.05 | 6.25 | 8.30 | 8.25 | 9.70 | 8.70 | 9.05 |
|
49 |
+
| IQ2_XXS | 100 | -18.18 | 7.47 | 7.50 | 6.80 | 5.15 | 7.55 | 7.30 | 9.05 | 7.65 | 8.80 |
|
50 |
+
|
51 |
+
Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16. Interestingly enough, while roleplay takes one of the biggest hits, writing seems to be improved on the Q4 and Q8? I think you'd really need to test. Interestingly the XS quants fare worses and the IQ4_XS does worse than the IQ3_M. The IQ2_XXS is trash and I included the 70B Full FP16 scores, but if the same pattern holds, I'd think you'd be better off with a 70B Q4_K_M (40GB) or IQ3_M (32GB).
|
52 |
+
|
53 |
+
In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
|
54 |
+
|
55 |
|
56 |
## Making Quants
|
57 |
```
|