ikawrakow
/

mistral-7b-quantized-gguf

Inference Endpoints

Model card Files Files and versions Community

mistral-7b-quantized-gguf / README.md

ikawrakow's picture

Update README.md

da77a53 11 months ago

|

history blame contribute delete

1.4 kB

	---
	license: apache-2.0
	---

	This repository contains improved Mistral-7B quantized models in GGUF format for use with `llama.cpp`. The models are fully compatible with the oficial `llama.cpp` release and can be used out=of-the-box.

	The table shows a comparison between these models and the current `llama.cpp` quantization approach using Wikitext perplexities for a context length of 512 tokens.
	The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.

	\| Quantization \| Model file \| PPL(llama.cpp) \| Quantization Error \| PPL(new quants) \| Quantization Error \|
	\|--:\|--:\|--:\|--:\|--:\|--:\|
	\|Q3_K_S \| mistral-7b-q3ks.gguf \| 6.0692 \| 6.62% \| 6.0021 \| 5.44% \|
	\|Q3_K_M\| mistral-7b-q3km.gguf \| 5.8894 \| 3.46% \| 5.8489 \| 2.75% \|
	\|Q4_K_S\| mistral-7b-q4ks.gguf \| 5.7764 \| 1.48% \| 5.7349 \| 0.75% \|
	\|Q4_K_M\| mistral-7b-q4km.gguf \| 5.7539 \| 1.08% \| 5.7259 \| 0.59% \|
	\|Q5_K_S \| mistral-7b-q5ks.gguf \| 5.7258 \| 0.59% \| 5.7100 \| 0.31% \|
	\|Q4_0 \| mistral-7b-q40.gguf \| 5.8189 \| 2.23% \| 5.7924 \| 1.76% \|
	\|Q4_1 \| mistral-7b-q41.gguf \| 5.8244 \| 2.32% \| 5.7455 \| 0.94% \|
	\|Q5_0 \| mistral-7b-q50.gguf \| 5.7180 \| 0.45% \| 5.7070 \| 0.26% \|
	\|Q5_1 \| mistral-7b-q51.gguf \| 5.7128 \| 0.36% \| 5.7057 \| 0.24% \|

	In addition, a 2-bit model is provided (`mistral-7b-q2k-extra-small.gguf`). It has a perplexity of `6.7099` for a context length of 512, and `5.5744` for a context of 4096.