Not-For-All-Audiences

Model card Files Files and versions Community

File size: 1,542 Bytes

c6cd471
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce56e8c
 
6ff6ba9
 
 
c54d59d
 
 
0382cc7
00e994e
fee82bd
0dd3b25
1e64b63
c54d59d
 
 
2f28191
bdbc8d2
38a5ed6

---
license: apache-2.0
tags:
- not-for-all-audiences
- writing
- roleplay
- gguf
- gguf-imatrix
base_model:
- nakodanei/Blue-Orchid-2x7b
model_type: mixtral
quantized_by: Green-Sky
language:
- en
---

llama.cpp conversion of https://huggingface.co/nakodanei/Blue-Orchid-2x7b/

except for f16 and q8_0, every quant is using the `merge.imatrix`

`merge.imatrix` is a merge of `kalomaze-group_10_merged.172chunks.imatrix` and `wiki.train.400chunks.imatrix`, which took ~10min + ~20min to calulate on my machine.

full wiki.train would have taken 10h

for more info on imatrix handling see https://github.com/ggerganov/llama.cpp/pull/5302

### ppl (512 wiki.test, 300chunks)
| quant              | ppl (lower is better) |
|--------------------|-----|
| f16(baseline)      | 5.8839 +/- 0.05173 |
| q8_0               | 5.8880 +/- 0.05178 |
| q5_k_m             | 5.8912 +/- 0.05177 |
| q5_k_m(without-imat) | 5.8893 +/- 0.05174 |
| q4_k_m             | 5.9248 +/- 0.05216 |
| q4_k_m(without-imat) | 5.9492 +/- 0.05249 |
| iq3_xxs            | 6.1984 +/- 0.05475 |
| iq3_xxs(only-wiki) | 6.1796 +/- 0.05446 |
| iq3_xxs(only-kal)  | 6.1984 +/- 0.05475 |
| iq3_xxs(withou-imat) | 6.4228 +/- 0.05756 |

### Interesting observations
despite `merge.imatrix` being different from `kalomaze-group_10_merged.172chunks.imatrix`, they produce the exact same quantized iq3_xxs model file. (same hash, checked multiple times)

q5_k_m has a lower perplexity with the imatrix. but that probably is caused by kalomaze-group_10_merged diverging enough from wiki.