shisa-ai
/

shisa-v2-llama3.1-405b-GGUF

Model card Files Files and versions Community

leonardlin commited on Jun 7

Commit

45041fb

·

verified ·

1 Parent(s): dde1c7b

quant quality table

Files changed (1) hide show

README.md +20 -2

README.md CHANGED Viewed

@@ -31,9 +31,27 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
 | Q4_K_M  | 227       |
 | Q8_0    | 402       |
-Graph by ikawrakow comparing some lower-quality quant PPL (lower is better):
-![image.png](quantpplgraph.png)
 ## Making Quants
 ```

 | Q4_K_M  | 227       |
 | Q8_0    | 402       |
+## Quant Quality
+All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
+| Quant        | Size(GB)| % Diff       | Overall  | Writing   | Roleplay | Reasoning | Math     | Coding   | Extraction | STEM     | Humanities |
+|--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
+| Full FP16    | 810     |              | **9.13** | 9.25      | **9.55** | 8.15      | 8.90     | 9.10     | 9.65       | 9.10     | 9.35       |
+| IQ3_M        | 170     | -0.99        | 9.04     | 8.90      | 9.45     | 7.75      | 8.95     | 8.95     | 9.70       | **9.15** | 9.50       |
+| Q4_K_M       | 227     | -1.10        | 9.03     | **9.40**  | 9.00     | 8.25      | 8.85     | **9.10** | 9.50       | 8.90     | 9.25       |
+| Q8_0         | 405     | -1.20        | 9.02     | **9.40**  | 9.05     | **8.30**  | **9.20** | 8.70     | 9.50       | 8.45     | 9.55       |
+| W8A8-INT8    | 405     | -1.42        | 9.00     | 9.20      | 9.35     | 7.80      | 8.75     | 9.00     | 9.80       | 8.65     | 9.45       |
+| FP8-Dynamic  | 405     | -3.29        | 8.83     | 8.70      | 9.20     | 7.85      | 8.80     | 8.65     | 9.30       | 8.80     | 9.35       |
+| IQ3_XS       | 155     | -3.50        | 8.81     | 8.70      | 9.05     | 7.70      | 8.60     | 8.95     | 9.35       | 8.70     | 9.45       |
+| IQ4_XS       | 202     | -3.61        | 8.80     | 8.85      | **9.55** | 6.90      | 8.35     | 8.60     | **9.90**   | 8.65     | **9.60**   |
+| *70B FP16*   | 140     | -7.89        | 8.41     | 7.95      | 9.05     | 6.25      | 8.30     | 8.25     | 9.70       | 8.70     | 9.05       |
+| IQ2_XXS      | 100     | -18.18       | 7.47     | 7.50      | 6.80     | 5.15      | 7.55     | 7.30     | 9.05       | 7.65     | 8.80       |
+Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16. Interestingly enough, while roleplay takes one of the biggest hits, writing seems to be improved on the Q4 and Q8? I think you'd really need to test. Interestingly the XS quants fare worses and the IQ4_XS does worse than the IQ3_M. The IQ2_XXS is trash and I included the 70B Full FP16 scores, but if the same pattern holds, I'd think you'd be better off with a 70B Q4_K_M (40GB) or IQ3_M (32GB).
+In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
 ## Making Quants
 ```