Output + Embedding | ||||||||
---|---|---|---|---|---|---|---|---|
2-bit | 3-bit | 4-bit | 5-bit | 6-bit | 8-bit | 16-bit | 32-bit | |
AXL | BXL | CXL | DXL | EXL | FXL | GXL | HXL |
Master table
Bits | Variant | Size (GB) | BPW | PPL | PPL error |
---|---|---|---|---|---|
1 | IQ1_M_FXL | 2.19 | 2.32 | 1.8967 | 0.01174 |
1 | IQ1_M_GXL | 2.68 | 2.85 | 1.8969 | 0.01175 |
1 | IQ1_M_HXL | 3.73 | 3.97 | 1.8965 | 0.01174 |
2 | Q2_K_FXL | 3.13 | 3.33 | 1.6234 | 0.00922 |
2 | Q2_K_GXL | 3.63 | 3.86 | 1.6234 | 0.00922 |
2 | Q2_K_HXL | 4.68 | 4.98 | 1.6234 | 0.00922 |
3 | Q3_K_M_FXL | 3.92 | 4.17 | 1.5674 | 0.00864 |
3 | Q3_K_M_GXL | 4.41 | 4.70 | 1.5674 | 0.00864 |
3 | Q3_K_M_HXL | 5.46 | 5.81 | 1.5672 | 0.00864 |
4 | Q4_K_M_FXL | 4.75 | 5.06 | 1.5567 | 0.00852 |
4 | Q4_K_M_GXL | 5.24 | 5.58 | 1.5570 | 0.00853 |
4 | Q4_K_M_HXL | 6.29 | 6.70 | 1.5566 | 0.00852 |
5 | Q5_K_M_FXL | 5.50 | 5.85 | 1.5572 | 0.00855 |
5 | Q5_K_M_GXL | 5.99 | 6.38 | 1.5574 | 0.00856 |
5 | Q5_K_M_HXL | 7.04 | 7.50 | 1.5570 | 0.00855 |
6 | Q6_K_FXL | 6.29 | 6.70 | 1.5525 | 0.00848 |
6 | Q6_K_GXL | 6.78 | 7.22 | 1.5524 | 0.00848 |
6 | Q6_K_HXL | 7.83 | 8.34 | 1.5523 | 0.00848 |
8 | Q8_0_GXL | 8.47 | 9.03 | 1.5515 | 0.00847 |
8 | Q8_0_HXL | 9.52 | 10.14 | 1.5514 | 0.00847 |
16 | BF16 | 15.00 | 16.00 | 1.5523 | 0.00848 |
Variant chooser, prefer FXL first
Variant (preferred) | Size (GB) | Quality vs BF16 | Inference speed | Long context headroom |
---|---|---|---|---|
IQ1_M_FXL | 2.19 | Low | Fastest | Excellent |
Q2_K_FXL | 3.13 | Fair | Very fast | Excellent |
Q3_K_M_FXL | 3.92 | Good | Fast | Very good |
Q4_K_M_FXL | 4.75 | Excellent | Fast | Good |
Q5_K_M_FXL | 5.50 | Excellent | Medium | Good |
Q6_K_FXL | 6.29 | Excellent | Medium | OK |
Q8_0_GXL | 8.47 | Excellent | Slower | Tight |
BF16 | 15.00 | Reference | Slowest | Very tight |
Quick picks by GPU VRAM
GPU VRAM | Pick | Why |
---|---|---|
16 GB | Q6_K_FXL or Q4_K_M_FXL | Same quality as BF16 in your PPL, plenty of room for long context or batching. |
12 GB | Q4_K_M_FXL | Best balance on 12 GB, strong quality with good headroom. |
8 GB | Q4_K_M_FXL (default) or Q3_K_M_FXL for longer ctx | Q4 runs well; drop to Q3 if you need more KV cache. |
6 GB | Q3_K_M_FXL | Usually fits with comfortable headroom. Use Q4_K_M_FXL only for short ctx. |
4 GB | Q2_K_FXL first, IQ1_M_FXL last resort | Fits strict limits. Accept the quality hit as needed. |
Notes
- Preference order for size at equal quality: FXL first, then GXL, then HXL.
- If you need more context headroom, drop one quant level rather than pushing to heavier weights.
- Downloads last month
- 11,838
Hardware compatibility
Log In
to view the estimation
Model tree for marcelone/Hunyuan-7B-Instruct-GGUF
Base model
tencent/Hunyuan-7B-Pretrain
Finetuned
tencent/Hunyuan-7B-Instruct