Not-For-All-Audiences

Update README.md

a55c121 verified over 1 year ago

1.54 kB

	---
	license: apache-2.0
	tags:
	- not-for-all-audiences
	- writing
	- roleplay
	- gguf
	- gguf-imatrix
	base_model:
	- nakodanei/Blue-Orchid-2x7b
	model_type: mixtral
	quantized_by: Green-Sky
	language:
	- en
	---

	llama.cpp conversion of https://huggingface.co/nakodanei/Blue-Orchid-2x7b/

	except for f16 and q8_0, every quant is using the `merge.imatrix`

	`merge.imatrix` is a merge of `kalomaze-group_10_merged.172chunks.imatrix` and `wiki.train.400chunks.imatrix`, which took ~10min + ~20min to calulate on my machine.

	full wiki.train would have taken 10h

	for more info on imatrix handling see https://github.com/ggerganov/llama.cpp/pull/5302

	### ppl (512 wiki.test, 300chunks)
	\| quant \| ppl (lower is better) \|
	\|--------------------\|-----\|
	\| f16(baseline) \| 5.8839 +/- 0.05173 \|
	\| q8_0 \| 5.8880 +/- 0.05178 \|
	\| q5_k_m \| 5.8912 +/- 0.05177 \|
	\| q5_k_m(without-imat) \| 5.8893 +/- 0.05174 \|
	\| q4_k_m \| 5.9248 +/- 0.05216 \|
	\| q4_k_m(without-imat) \| 5.9492 +/- 0.05249 \|
	\| iq3_xxs \| 6.1984 +/- 0.05475 \|
	\| iq3_xxs(only-wiki) \| 6.1796 +/- 0.05446 \|
	\| iq3_xxs(only-kal) \| 6.1984 +/- 0.05475 \|
	\| iq3_xxs(withou-imat) \| 6.4228 +/- 0.05756 \|

	### Interesting observations
	despite `merge.imatrix` being different from `kalomaze-group_10_merged.172chunks.imatrix`, they produce the exact same quantized iq3_xxs model file. (same hash, checked multiple times)

	q5_k_m has a lower perplexity with the imatrix. but that probably is caused by kalomaze-group_10_merged diverging enough from wiki.