Output + Embedding | ||||||||
---|---|---|---|---|---|---|---|---|
2-bit | 3-bit | 4-bit | 5-bit | 6-bit | 8-bit | 16-bit | 32-bit | |
AXL | BXL | CXL | DXL | EXL | FXL | GXL | HXL |
Master Table
Variant | Size (GB) | BPW | PPL | PPL error |
---|---|---|---|---|
IQ3_M_FXL | 2.06 | 4.08 | 1.9788 | 0.01061 |
IQ3_M_GXL | 2.42 | 4.80 | 1.9785 | 0.01061 |
IQ3_M_HXL | 3.20 | 6.35 | 1.9784 | 0.01061 |
IQ4_XS_FLX | 2.73 | 4.69 | 1.9284 | 0.01018 |
IQ4_XS_GXL | 2.36 | 5.42 | 1.9282 | 0.01018 |
IQ4_XS_HXL | 3.51 | 6.96 | 1.9282 | 0.01018 |
IQ4_NL_GXL | 2.84 | 5.64 | 1.9307 | 0.01024 |
IQ4_NL_HXL | 3.62 | 7.18 | 1.9305 | 0.01023 |
Q4_K_M_GXL | 2.96 | 5.87 | 1.9477 | 0.01047 |
Q4_K_M_HXL | 3.73 | 7.41 | 1.9475 | 0.01047 |
Q5_K_M_FXL | 2.98 | 5.92 | 1.9260 | 0.01024 |
Q5_K_M_GXL | 3.35 | 6.65 | 1.9259 | 0.01023 |
Q5_K_M_HXL | 4.13 | 8.19 | 1.9257 | 0.01023 |
Q6_K_FXL | 3.40 | 6.75 | 1.9211 | 0.01018 |
Q6_K_GXL | 3.77 | 7.48 | 1.9207 | 0.01018 |
Q6_K_HXL | 4.54 | 9.02 | 1.9206 | 0.01017 |
Q8_0_GXL | 4.65 | 9.23 | 1.9245 | 0.01026 |
Q8_0_HXL | 5.42 | 10.77 | 1.9241 | 0.01025 |
BF16 | 8.05 | 16.00 | 1.9233 | 0.01024 |
BF16_HXL | 18.83 | 17.55 | 1.9231 | 0.01024 |
F32 | 16.10 | 32.00 | 1.9232 | 0.01024 |
Variant chooser, prefer FXL first
(these are my personal notes to help you choose)
Variant (preferred) | Size (GB) | Quality vs BF16 | Inference speed | Long context headroom | My notes to you |
---|---|---|---|---|---|
IQ3_M_FXL | 2.06 | Low | Very fast | Excellent | I reach 76.33 tok/sec at 32k and 61.28 tok/sec at 64k. Use it when you must fit very tight limits. |
IQ4_XS_FLX | 2.73 | Very good | Fast | Very good | I like this as a small yet stable 4-bit. If you need more raw speed, try IQ4_XS_GXL. |
Q5_K_M_FXL | 2.98 | Very good | Medium fast | Very good | I use this when I want sturdier outputs than 4-bit with almost no size penalty. |
Q6_K_FXL | 3.40 | Excellent | Medium fast | Very good | I lean on this for balanced quality, speed, and long contexts. |
Q8_0_GXL | 4.65 | Excellent | Medium | OK | In my tests it kept high quality, 54.21 tok/sec at 16k and 52.04 at 32k. |
BF16 | 8.05 | Reference | Slow | Tight | I use BF16 when I want very high quality without going full F32. |
Quick picks by GPU VRAM
(again, these are personal notes from my RTX 3060 12 GB with 48 GB RAM)
GPU VRAM | Pick | Why I recommend it |
---|---|---|
16 GB | Q6_K_GXL or Q8_0_GXL, consider BF16 for near best IQ | I get near-BF16 quality with room for long context or batching. BF16 fits but leaves less headroom. |
12 GB | Q6_K_GXL for balance, or Q8_0_GXL for quality focus | On my 3060 12 GB these give strong quality and good 32k performance. |
8 GB | IQ4_XS_GXL for speed, Q5_K_M_GXL for sturdier outputs | Both leave comfortable KV space for longer contexts in my runs. |
6 GB | Q5_K_M_FXL or IQ4_XS_FLX | I find these the safest balance when memory is tight. |
4 GB | IQ3_M_FXL first, IQ4_XS_FLX if your context still fits | I reach the best chance of running under strict limits with IQ3_M_FXL. |
PROMPTS
Lesson
Create a C1 level dialogue for language learning. First, provide the introduction to the dialogue in English, then the dialogue in Italian. Then, an English translation of the dialogue. Next, there is a vocabulary list of all the Italian words used in the text, including their gender, class, and English meaning. Point out which words are more frequent, as they should be memorized for mastering that level. The grammar part should explain the grammar used in the text and present some grammar patterns that should be memorized as they are essential for mastering that level. Focus on explaining for students. Then, create a translation exercise section using sentences from the text for English to Italian translation.
Conversation Practice
I'd like to role-play. You'll act as an Argentine tourist asking me questions in simple Spanish about my city. Please keep the language at B1 level throughout.
You are "Ana", a Brazilian tourist. Always speak at B1 level, with sentences of up to 18 words, and simple punctuation. Do not invent facts about the user. If you are not sure, say "I do not know" and ask. Do not describe the weather, people’s clothing, or environmental details unless the user mentions them. Avoid ending the conversation with farewells unless the user ends it. Do not use em dashes; prefer commas, parentheses, or colons.
- Downloads last month
- 199
Model tree for marcelone/Qwen3-4B-Instruct-2507-gguf
Base model
Qwen/Qwen3-4B-Instruct-2507