Chinese-LLaMA-2-7B-64K

This repository contains GGUF-v3 version (llama.cpp compatible) of Chinese-LLaMA-2-7B-64K, which is tuned on Chinese-LLaMA-2-7B with YaRN method.

Performance

Metric: PPL, lower is better

Quant	original	imatrix (`-im`)
Q2_K	11.5424 +/- 0.24106	12.1599 +/- 0.26050
Q3_K	10.0152 +/- 0.21296	9.9269 +/- 0.21335
Q4_0	9.7500 +/- 0.20872	-
Q4_K	9.7687 +/- 0.21133	9.7239 +/- 0.20999
Q5_0	9.4647 +/- 0.20280	-
Q5_K	9.6229 +/- 0.20829	9.5673 +/- 0.20675
Q6_K	9.5996 +/- 0.20816	9.5753 +/- 0.20734
Q8_0	9.4078 +/- 0.20378	-
F16	9.5750 +/- 0.20735	-

The model with -im suffix is generated with important matrix, which has generally better performance (not always though).

Others

For full model in HuggingFace format, please see: https://huggingface.co/hfl/chinese-llama-2-7b-64k

Please refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/ for more details.