Transformers
GGUF
imatrix
conversational
leonardlin commited on
Commit
0333fae
·
verified ·
1 Parent(s): 45041fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -22,7 +22,7 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
22
 
23
  ## Provided Quants
24
 
25
- | Type | Size (GB) |
26
  |:--------|----------:|
27
  | IQ2_XXS | 100 |
28
  | IQ3_XS | 155 |
@@ -35,7 +35,7 @@ This repo contains select GGUF quants of [shisa-ai/shisa-v2-llama3.1-405b](https
35
  ## Quant Quality
36
  All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
37
 
38
- | Quant | Size(GB)| % Diff | Overall | Writing | Roleplay | Reasoning | Math | Coding | Extraction | STEM | Humanities |
39
  |--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
40
  | Full FP16 | 810 | | **9.13** | 9.25 | **9.55** | 8.15 | 8.90 | 9.10 | 9.65 | 9.10 | 9.35 |
41
  | IQ3_M | 170 | -0.99 | 9.04 | 8.90 | 9.45 | 7.75 | 8.95 | 8.95 | 9.70 | **9.15** | 9.50 |
@@ -48,7 +48,11 @@ All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guid
48
  | *70B FP16* | 140 | -7.89 | 8.41 | 7.95 | 9.05 | 6.25 | 8.30 | 8.25 | 9.70 | 8.70 | 9.05 |
49
  | IQ2_XXS | 100 | -18.18 | 7.47 | 7.50 | 6.80 | 5.15 | 7.55 | 7.30 | 9.05 | 7.65 | 8.80 |
50
 
51
- Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16. Interestingly enough, while roleplay takes one of the biggest hits, writing seems to be improved on the Q4 and Q8? I think you'd really need to test. Interestingly the XS quants fare worses and the IQ4_XS does worse than the IQ3_M. The IQ2_XXS is trash and I included the 70B Full FP16 scores, but if the same pattern holds, I'd think you'd be better off with a 70B Q4_K_M (40GB) or IQ3_M (32GB).
 
 
 
 
52
 
53
  In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
54
 
 
22
 
23
  ## Provided Quants
24
 
25
+ | Type | Size (GiB) |
26
  |:--------|----------:|
27
  | IQ2_XXS | 100 |
28
  | IQ3_XS | 155 |
 
35
  ## Quant Quality
36
  All quants have been tested with JA MT-Bench (judged by GPT-4.1) as a rough guide for quality:
37
 
38
+ | Quant | Size (GiB)| % Diff | Overall | Writing | Roleplay | Reasoning | Math | Coding | Extraction | STEM | Humanities |
39
  |--------------|--------:|-------------:|---------:|----------:|---------:|----------:|---------:|---------:|-----------:|---------:|-----------:|
40
  | Full FP16 | 810 | | **9.13** | 9.25 | **9.55** | 8.15 | 8.90 | 9.10 | 9.65 | 9.10 | 9.35 |
41
  | IQ3_M | 170 | -0.99 | 9.04 | 8.90 | 9.45 | 7.75 | 8.95 | 8.95 | 9.70 | **9.15** | 9.50 |
 
48
  | *70B FP16* | 140 | -7.89 | 8.41 | 7.95 | 9.05 | 6.25 | 8.30 | 8.25 | 9.70 | 8.70 | 9.05 |
49
  | IQ2_XXS | 100 | -18.18 | 7.47 | 7.50 | 6.80 | 5.15 | 7.55 | 7.30 | 9.05 | 7.65 | 8.80 |
50
 
51
+ Due to margin of error, you could probably fairly say that the IQ3_M, Q4_K_M, and Q8_0 GGUFs have almost no functional loss versus the FP16.
52
+
53
+ Interestingly enough, while roleplay takes one of the biggest hits, writing seems to be improved on the Q4 and Q8? I think you'd really need to test more (more samples, more runs, more evals) to really see what's going on. Also interestingly the XS quants track pretty consistently, with the IQ4_XS doing worse than the IQ3_M.
54
+
55
+ The IQ2_XXS scores extremely poorly. I included the 70B Full FP16 scores as a baseline and I'd expect you'd be better off running a decent Shisa V2 70B Q4_K_M (40GB) or IQ3_M (32GB) vs the IQ2.
56
 
57
  In an ideal world, of course, you should test different quants on your downstream tasks, but I understand that that's not always an option. Based on this testing though, if you had to pick on bang/buck quant blind, I'd start with the IQ3_M.
58