Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
|
|
22 |
|
23 |
## Provided Quants
|
24 |
|
25 |
-
| Type | Size (
|
26 |
|:--------|----------:|
|
27 |
| IQ2_XXS | 100 |
|
28 |
| IQ3_XS | 155 |
|
@@ -35,7 +35,7 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
|
|
35 |
## Quant Quality
|
36 |
All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
|
37 |
|
38 |
-
| Quant | Size(
|
39 |
|--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
|
40 |
| Full FP16 | 810 | | **9.13** | 9.25 | **9.55** | 8.15 | 8.90 | 9.10 | 9.65 | 9.10 | 9.35 |
|
41 |
| IQ3_M | 170 | -0.99 | 9.04 | 8.90 | 9.45 | 7.75 | 8.95 | 8.95 | 9.70 | **9.15** | 9.50 |
|
@@ -48,7 +48,11 @@ All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guid
|
|
48 |
| *70B FP16* | 140 | -7.89 | 8.41 | 7.95 | 9.05 | 6.25 | 8.30 | 8.25 | 9.70 | 8.70 | 9.05 |
|
49 |
| IQ2_XXS | 100 | -18.18 | 7.47 | 7.50 | 6.80 | 5.15 | 7.55 | 7.30 | 9.05 | 7.65 | 8.80 |
|
50 |
|
51 |
-
Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16.
|
|
|
|
|
|
|
|
|
52 |
|
53 |
In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
|
54 |
|
|
|
22 |
|
23 |
## Provided Quants
|
24 |
|
25 |
+
| Type | Size (GiB) |
|
26 |
|:--------|----------:|
|
27 |
| IQ2_XXS | 100 |
|
28 |
| IQ3_XS | 155 |
|
|
|
35 |
## Quant Quality
|
36 |
All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
|
37 |
|
38 |
+
| Quant | Size (GiB)| % Diff | Overall | Writing | Roleplay | Reasoning | Math | Coding | Extraction | STEM | Humanities |
|
39 |
|--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
|
40 |
| Full FP16 | 810 | | **9.13** | 9.25 | **9.55** | 8.15 | 8.90 | 9.10 | 9.65 | 9.10 | 9.35 |
|
41 |
| IQ3_M | 170 | -0.99 | 9.04 | 8.90 | 9.45 | 7.75 | 8.95 | 8.95 | 9.70 | **9.15** | 9.50 |
|
|
|
48 |
| *70B FP16* | 140 | -7.89 | 8.41 | 7.95 | 9.05 | 6.25 | 8.30 | 8.25 | 9.70 | 8.70 | 9.05 |
|
49 |
| IQ2_XXS | 100 | -18.18 | 7.47 | 7.50 | 6.80 | 5.15 | 7.55 | 7.30 | 9.05 | 7.65 | 8.80 |
|
50 |
|
51 |
+
Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16.
|
52 |
+
|
53 |
+
Interestingly enough, while roleplay takes one of the biggest hits, writing seems to be improved on the Q4 and Q8? I think you'd really need to test more (more samples, more runs, more evals) to really see what's going on. Also interestingly the XS quants track pretty consistently, with the IQ4_XS doing worse than the IQ3_M.
|
54 |
+
|
55 |
+
The IQ2_XXS scores extremely poorly. I included the 70B Full FP16 scores as a baseline and I'd expect you'd be better off running a decent Shisa V2 70B Q4_K_M (40GB) or IQ3_M (32GB) vs the IQ2.
|
56 |
|
57 |
In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
|
58 |
|