hfl
/

chinese-mixtral-instruct-gguf

Mixture of Experts

Inference Endpoints

Model card Files Files and versions

chinese-mixtral-instruct-gguf / README.md

hfl-rc's picture

Update README.md

1093147 verified 8 months ago

|

2.49 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	tags:
	- moe
	---

	# Chinese-Mixtral-Instruct-GGUF
	<p align="center">
	<a href="https://github.com/ymcui/Chinese-Mixtral"><img src="https://ymcui.com/images/chinese-mixtral-banner.png" width="600"/></a>
	</p>

	Chinese Mixtral GitHub repository: https://github.com/ymcui/Chinese-Mixtral

	This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-Mixtral-Instruct (chat/instruction model).

	Note: When using instruction/chat model, you MUST follow the official prompt template! Example: [chat.sh](https://github.com/ymcui/Chinese-Mixtral/blob/main/scripts/llamacpp/chat.sh)

	## Performance

	Metric: PPL, lower is better

	\| Quant \| Size ↓ \| PPL \|
	\| ------- \| ------- \| ------------------ \|
	\| IQ1_S \| 9.8 GB \| 9.5782 +/- 0.08909 \|
	\| IQ1_M \| 10.8 GB \| 7.4666 +/- 0.06741 \|
	\| IQ2_XXS \| 12.3 GB \| 6.3923 +/- 0.05674 \|
	\| IQ2_XS \| 13.7 GB \| 6.0606 +/- 0.05834 \|
	\| IQ2_S \| 14.1 GB \| 4.7617 +/- 0.04177 \|
	\| IQ2_M \| 15.5 GB \| 4.5911 +/- 0.04054 \|
	\| Q2_K \| 17.3 GB \| 4.8592 +/- 0.04303 \|
	\| IQ3_XXS \| 18.3 GB \| 4.3557 +/- 0.03846 \|
	\| IQ3_XS \| 19.3 GB \| 4.3328 +/- 0.03779 \|
	\| IQ3_S \| 20.4 GB \| 4.3138 +/- 0.03785 \|
	\| IQ3_M \| 21.4 GB \| 4.3024 +/- 0.03775 \|
	\| Q3_K \| 22.5 GB \| 4.4334 +/- 0.03937 \|
	\| IQ4_XS \| 25.1 GB \| 4.2324 +/- 0.03757 \|
	\| Q4_0 \| 26.4 GB \| 4.2688 +/- 0.03787 \|
	\| IQ4_NL \| 26.5 GB \| 4.2384 +/- 0.03763 \|
	\| Q4_K \| 28.4 GB \| 4.2433 +/- 0.03768 \|
	\| Q5_0 \| 32.2 GB \| 4.2142 +/- 0.03733 \|
	\| Q5_K \| 33.2 GB \| 4.2177 +/- 0.03743 \|
	\| Q6_K \| 38.4 GB \| 4.2184 +/- 0.03754 \|
	\| Q8_0 \| 49.6 GB \| 4.2053 +/- 0.03732 \|
	\| F16 \| 93.5 GB \| x \|

	Due to the file size limitation, for F16 model, please use `cat` command to concatenate all parts into a single file. You must concatenate these parts in order.


	## Others

	For Hugging Face version, please see: https://huggingface.co/hfl/chinese-mixtral-instruct

	Please refer to [https://github.com/ymcui/Chinese-Mixtral/](https://github.com/ymcui/Chinese-Mixtral/) for more details.


	## Citation

	Please consider cite our paper if you use the resource of this repository.
	Paper link: https://arxiv.org/abs/2403.01851
	```
	@article{chinese-mixtral,
	title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral},
	author={Cui, Yiming and Yao, Xin},
	journal={arXiv preprint arXiv:2403.01851},
	url={https://arxiv.org/abs/2403.01851},
	year={2024}
	}
	```