hfl
/

chinese-mixtral-instruct-gguf

Mixture of Experts

Inference Endpoints

Model card Files Files and versions

chinese-mixtral-instruct-gguf / README.md

hfl-rc's picture

Update README.md

28fedc3 verified 8 months ago

|

1.96 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	tags:
	- moe
	---

	# Chinese-Mixtral-Instruct-GGUF
	<p align="center">
	<a href="https://github.com/ymcui/Chinese-Mixtral"><img src="https://ymcui.com/images/chinese-mixtral-banner.png" width="600"/></a>
	</p>

	Chinese Mixtral GitHub repository: https://github.com/ymcui/Chinese-Mixtral

	This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-Mixtral-Instruct (chat/instruction model).

	Note: When using instruction/chat model, you MUST follow the official prompt template! Example: [chat.sh](https://github.com/ymcui/Chinese-Mixtral/blob/main/scripts/llamacpp/chat.sh)

	## Performance

	Metric: PPL, lower is better

	\| Quant \| PPL \|
	\| ----- \| ---- \|
	\| IQ1_S \| 27.7911 +/- 0.27400 \|
	\| IQ2_XXS \| 6.7233 +/- 0.06118 \|
	\| IQ2_XS \| 7.4175 +/- 0.08420 \|
	\| Q2_K \| 4.5758 +/- 0.03959 \|
	\| IQ3_XXS \| 4.0389 +/- 0.03489 \|
	\| Q3_K \| 4.5563 +/- 0.04126 \|
	\| Q4_0 \| 3.9757 +/- 0.03455 \|
	\| Q4_K \| 3.9265 +/- 0.03407 \|
	\| Q5_0 \| 3.9167 +/- 0.03399 \|
	\| Q5_K \| 3.9232 +/- 0.03403 \|
	\| Q6_K \| 3.9242 +/- 0.03415 \|
	\| Q8_0 \| 3.9159 +/- 0.03402 \|
	\| F16 \| x \|

	Due to the file size limitation, for F16 model, please use `cat` command to concatenate all parts into a single file. You must concatenate these parts in order.


	## Others

	For Hugging Face version, please see: https://huggingface.co/hfl/chinese-mixtral-instruct

	Please refer to [https://github.com/ymcui/Chinese-Mixtral/](https://github.com/ymcui/Chinese-Mixtral/) for more details.


	## Citation

	Please consider cite our paper if you use the resource of this repository.
	Paper link: https://arxiv.org/abs/2403.01851
	```
	@article{chinese-mixtral,
	title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral},
	author={Cui, Yiming and Yao, Xin},
	journal={arXiv preprint arXiv:2403.01851},
	url={https://arxiv.org/abs/2403.01851},
	year={2024}
	}
	```