Qwen2.5-0.5B quantized to 4-bits per-tensor. Comparable performance to GPTQ (128g desc).

Tasks Version Filter n-shot Metric f16 this gptq
arc_challenge 1 none 0 acc ↑ 0.2918 0.2705 0.2730
arc_easy 1 none 0 acc ↑ 0.6465 0.6393 0.6031
boolq 2 none 0 acc ↑ 0.6208 0.5862 0.6232
hellaswag 1 none 0 acc ↑ 0.4061 0.3888 0.3969
piqa 1 none 0 acc ↑ 0.7051 0.6861 0.6801
winogrande 1 none 0 acc ↑ 0.5635 0.5762 0.5659
average acc ↑ 0.5390 0.5245 0.5237

To reproduce evals see this colab.

Note: This model is fake quantized and has scaling vectors fused into the weights for ease of evaluation (so the weights are float16 and have > 16 unique values). See the colab above for how to convert to weights with 16 unique values.

Downloads last month
389
Safetensors
Model size
631M params
Tensor type
F32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for smpanaro/Qwen2.5-0.5B-4bit-PerTensor

Base model

Qwen/Qwen2.5-0.5B
Quantized
(69)
this model