Edit model card

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Original model: https://huggingface.co/google/Qwen2-7B

Download a file (not the whole branch) from below:

Filename Quant type File Size Perplexity (wikitext-2-raw-v1.test)
Qwen2-7B.BF16.gguf BF16 15.2GB coming_soon
Qwen2-7B-Q8_0.gguf Q8_0 8.1GB 7.3817 +/- 0.04777
Qwen2-7B-Q6_K.gguf Q6_K 6.25GB 7.3914 +/- 0.04776
Qwen2-7B-Q5_K_M.gguf Q5_K_M 5.44GB 7.4067 +/- 0.04794
Qwen2-7B-Q5_K_S.gguf Q5_K_S 5.32GB 7.4291 +/- 0.04822
Qwen2-7B-Q4_K_M.gguf Q4_K_M 4.68GB 7.4796 +/- 0.04856
Qwen2-7B-Q4_K_S.gguf Q4_K_S 4.46GB 7.5221 +/- 0.04879
Qwen2-7B-Q3_K_L.gguf Q3_K_L 4.09GB 7.6843 +/- 0.05000
Qwen2-7B-Q3_K_M.gguf Q3_K_M 3.81GB 7.7390 +/- 0.05015
Qwen2-7B-Q3_K_S.gguf Q3_K_S 3.49GB 9.3743 +/- 0.06023
Qwen2-7B-Q2_K.gguf Q2_K 3.02GB 10.5122 +/- 0.06850

Benchmark Results

Results have been computed using:

hellaswage_val_full

winogrande-debiased-eval

mmlu-validation

Benchmark Quant type Metric
WinoGrande (0-shot) Q8_0 71.8232 +/- 1.2643
WinoGrande (0-shot) Q4_K_M 71.3496 +/- 1.2707
WinoGrande (0-shot) Q3_K_M 70.1657 +/- 1.2859
WinoGrande (0-shot) Q3_K_S 70.3236 +/- 1.2839
WinoGrande (0-shot) Q2_K 68.2715 +/- 1.3081
HellaSwag (0-shot) Q8_0 78.00238996
HellaSwag (0-shot) Q4_K_M 77.92272456
HellaSwag (0-shot) Q3_K_M 76.97669787
HellaSwag (0-shot) Q3_K_S 74.96514639
HellaSwag (0-shot) Q2_K 72.71459869
MMLU (0-shot) Q8_0 39.1473 +/- 1.2409
MMLU (0-shot) Q4_K_M 38.5013 +/- 1.2372
MMLU (0-shot) Q3_K_M 38.0491 +/- 1.2344
MMLU (0-shot) Q3_K_S 39.3411 +/- 1.2420
MMLU (0-shot) Q2_K 35.4005 +/- 1.2158

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/Qwen2-7B-GGUF --include "Qwen2-7B-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/Qwen2-7B-GGUF --include "Qwen2-7B-Q8_0.gguf/*" --local-dir Qwen2-7B-Q8_0

You can either specify a new local-dir (Qwen2-7B-Q8_0) or download them all in place (./)

Reproducibility

Same instructions of: https://github.com/ggerganov/llama.cpp/discussions/9020#discussioncomment-10335638

Downloads last month
563
GGUF
Model size
7.62B params
Architecture
qwen2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fedric95/Qwen2-7B-GGUF

Base model

Qwen/Qwen2-7B
Quantized
(47)
this model