marcelone/Hunyuan-7B-Instruct-GGUF

Output + Embedding
2-bit	3-bit	4-bit	5-bit	6-bit	8-bit	16-bit	32-bit
AXL	BXL	CXL	DXL	EXL	FXL	GXL	HXL

Bits	Variant	Size (GB)	BPW	PPL	PPL error
1	IQ1_M_FXL	2.19	2.32	1.8967	0.01174
1	IQ1_M_GXL	2.68	2.85	1.8969	0.01175
1	IQ1_M_HXL	3.73	3.97	1.8965	0.01174
2	Q2_K_FXL	3.13	3.33	1.6234	0.00922
2	Q2_K_GXL	3.63	3.86	1.6234	0.00922
2	Q2_K_HXL	4.68	4.98	1.6234	0.00922
3	Q3_K_M_FXL	3.92	4.17	1.5674	0.00864
3	Q3_K_M_GXL	4.41	4.70	1.5674	0.00864
3	Q3_K_M_HXL	5.46	5.81	1.5672	0.00864
4	Q4_K_M_FXL	4.75	5.06	1.5567	0.00852
4	Q4_K_M_GXL	5.24	5.58	1.5570	0.00853
4	Q4_K_M_HXL	6.29	6.70	1.5566	0.00852
5	Q5_K_M_FXL	5.50	5.85	1.5572	0.00855
5	Q5_K_M_GXL	5.99	6.38	1.5574	0.00856
5	Q5_K_M_HXL	7.04	7.50	1.5570	0.00855
6	Q6_K_FXL	6.29	6.70	1.5525	0.00848
6	Q6_K_GXL	6.78	7.22	1.5524	0.00848
6	Q6_K_HXL	7.83	8.34	1.5523	0.00848
8	Q8_0_GXL	8.47	9.03	1.5515	0.00847
8	Q8_0_HXL	9.52	10.14	1.5514	0.00847
16	BF16	15.00	16.00	1.5523	0.00848

Variant (preferred)	Size (GB)	Quality vs BF16	Inference speed	Long context headroom
IQ1_M_FXL	2.19	Low	Fastest	Excellent
Q2_K_FXL	3.13	Fair	Very fast	Excellent
Q3_K_M_FXL	3.92	Good	Fast	Very good
Q4_K_M_FXL	4.75	Excellent	Fast	Good
Q5_K_M_FXL	5.50	Excellent	Medium	Good
Q6_K_FXL	6.29	Excellent	Medium	OK
Q8_0_GXL	8.47	Excellent	Slower	Tight
BF16	15.00	Reference	Slowest	Very tight

GPU VRAM	Pick	Why
16 GB	Q6_K_FXL or Q4_K_M_FXL	Same quality as BF16 in your PPL, plenty of room for long context or batching.
12 GB	Q4_K_M_FXL	Best balance on 12 GB, strong quality with good headroom.
8 GB	Q4_K_M_FXL (default) or Q3_K_M_FXL for longer ctx	Q4 runs well; drop to Q3 if you need more KV cache.
6 GB	Q3_K_M_FXL	Usually fits with comfortable headroom. Use Q4_K_M_FXL only for short ctx.
4 GB	Q2_K_FXL first, IQ1_M_FXL last resort	Fits strict limits. Accept the quality hit as needed.

Notes

Preference order for size at equal quality: FXL first, then GXL, then HXL.
If you need more context headroom, drop one quant level rather than pushing to heavier weights.

marcelone
/

Hunyuan-7B-Instruct-GGUF