hfl
/

chinese-mixtral-instruct-gguf

Mixture of Experts

Inference Endpoints

Model card Files Files and versions

chinese-mixtral-instruct-gguf / README.md

hfl-rc's picture

Update README.md

88b853d verified 10 months ago

|

1.52 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	tags:
	- moe
	---

	# Chinese-Mixtral-Instruct-GGUF
	<p align="center">
	<a href="https://github.com/ymcui/Chinese-Mixtral"><img src="https://ymcui.com/images/chinese-mixtral-banner.png" width="600"/></a>
	</p>

	Chinese Mixtral GitHub repository: https://github.com/ymcui/Chinese-Mixtral

	This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-Mixtral-Instruct (chat/instruction model).

	Note: When using instruction/chat model, you MUST follow the official prompt template! Example: [chat.sh](https://github.com/ymcui/Chinese-Mixtral/blob/main/scripts/llamacpp/chat.sh)

	## Performance

	Metric: PPL, lower is better

	\| Quant \| PPL \|
	\| ----- \| ---- \|
	\| IQ2_XXS \| 6.7233 +/- 0.06118 \|
	\| IQ2_XS \| 7.4175 +/- 0.08420 \|
	\| Q2_K \| 4.5758 +/- 0.03959 \|
	\| IQ3_XXS \| 4.0389 +/- 0.03489 \|
	\| Q3_K \| 4.5563 +/- 0.04126 \|
	\| Q4_0 \| 3.9757 +/- 0.03455 \|
	\| Q4_K \| 3.9265 +/- 0.03407 \|
	\| Q5_0 \| 3.9167 +/- 0.03399 \|
	\| Q5_K \| 3.9232 +/- 0.03403 \|
	\| Q6_K \| 3.9242 +/- 0.03415 \|
	\| Q8_0 \| 3.9159 +/- 0.03402 \|
	\| F16 \| x \|

	Due to the file size limitation, for F16 model, please use `cat` command to concatenate all parts into a single file. You must concatenate these parts in order.


	## Others

	For Hugging Face version, please see: https://huggingface.co/hfl/chinese-mixtral-instruct

	Please refer to [https://github.com/ymcui/Chinese-Mixtral/](https://github.com/ymcui/Chinese-Mixtral/) for more details.