Output + Embedding
2-bit 3-bit 4-bit 5-bit 6-bit 8-bit 16-bit 32-bit
AXL BXL CXL DXL EXL FXL GXL HXL

Master table

Bits Variant Size (GB) BPW PPL PPL error
1 IQ1_M_FXL 2.19 2.32 1.8967 0.01174
1 IQ1_M_GXL 2.68 2.85 1.8969 0.01175
1 IQ1_M_HXL 3.73 3.97 1.8965 0.01174
2 Q2_K_FXL 3.13 3.33 1.6234 0.00922
2 Q2_K_GXL 3.63 3.86 1.6234 0.00922
2 Q2_K_HXL 4.68 4.98 1.6234 0.00922
3 Q3_K_M_FXL 3.92 4.17 1.5674 0.00864
3 Q3_K_M_GXL 4.41 4.70 1.5674 0.00864
3 Q3_K_M_HXL 5.46 5.81 1.5672 0.00864
4 Q4_K_M_FXL 4.75 5.06 1.5567 0.00852
4 Q4_K_M_GXL 5.24 5.58 1.5570 0.00853
4 Q4_K_M_HXL 6.29 6.70 1.5566 0.00852
5 Q5_K_M_FXL 5.50 5.85 1.5572 0.00855
5 Q5_K_M_GXL 5.99 6.38 1.5574 0.00856
5 Q5_K_M_HXL 7.04 7.50 1.5570 0.00855
6 Q6_K_FXL 6.29 6.70 1.5525 0.00848
6 Q6_K_GXL 6.78 7.22 1.5524 0.00848
6 Q6_K_HXL 7.83 8.34 1.5523 0.00848
8 Q8_0_GXL 8.47 9.03 1.5515 0.00847
8 Q8_0_HXL 9.52 10.14 1.5514 0.00847
16 BF16 15.00 16.00 1.5523 0.00848

Variant chooser, prefer FXL first

Variant (preferred) Size (GB) Quality vs BF16 Inference speed Long context headroom
IQ1_M_FXL 2.19 Low Fastest Excellent
Q2_K_FXL 3.13 Fair Very fast Excellent
Q3_K_M_FXL 3.92 Good Fast Very good
Q4_K_M_FXL 4.75 Excellent Fast Good
Q5_K_M_FXL 5.50 Excellent Medium Good
Q6_K_FXL 6.29 Excellent Medium OK
Q8_0_GXL 8.47 Excellent Slower Tight
BF16 15.00 Reference Slowest Very tight

Quick picks by GPU VRAM

GPU VRAM Pick Why
16 GB Q6_K_FXL or Q4_K_M_FXL Same quality as BF16 in your PPL, plenty of room for long context or batching.
12 GB Q4_K_M_FXL Best balance on 12 GB, strong quality with good headroom.
8 GB Q4_K_M_FXL (default) or Q3_K_M_FXL for longer ctx Q4 runs well; drop to Q3 if you need more KV cache.
6 GB Q3_K_M_FXL Usually fits with comfortable headroom. Use Q4_K_M_FXL only for short ctx.
4 GB Q2_K_FXL first, IQ1_M_FXL last resort Fits strict limits. Accept the quality hit as needed.

Notes

  • Preference order for size at equal quality: FXL first, then GXL, then HXL.
  • If you need more context headroom, drop one quant level rather than pushing to heavier weights.
Downloads last month
11,838
GGUF
Model size
7.5B params
Architecture
hunyuan-dense
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for marcelone/Hunyuan-7B-Instruct-GGUF

Quantized
(14)
this model