Output + Embedding
2-bit	3-bit	4-bit	5-bit	6-bit	8-bit	16-bit	32-bit
AXL	BXL	CXL	DXL	EXL	FXL	GXL	HXL

Master Table

Variant	Size (GB)	BPW	PPL	PPL error
IQ3_M_FXL	2.06	4.08	1.9788	0.01061
IQ3_M_GXL	2.42	4.80	1.9785	0.01061
IQ3_M_HXL	3.20	6.35	1.9784	0.01061
IQ4_XS_FLX	2.73	4.69	1.9284	0.01018
IQ4_XS_GXL	2.36	5.42	1.9282	0.01018
IQ4_XS_HXL	3.51	6.96	1.9282	0.01018
IQ4_NL_GXL	2.84	5.64	1.9307	0.01024
IQ4_NL_HXL	3.62	7.18	1.9305	0.01023
Q4_K_M_GXL	2.96	5.87	1.9477	0.01047
Q4_K_M_HXL	3.73	7.41	1.9475	0.01047
Q5_K_M_FXL	2.98	5.92	1.9260	0.01024
Q5_K_M_GXL	3.35	6.65	1.9259	0.01023
Q5_K_M_HXL	4.13	8.19	1.9257	0.01023
Q6_K_FXL	3.40	6.75	1.9211	0.01018
Q6_K_GXL	3.77	7.48	1.9207	0.01018
Q6_K_HXL	4.54	9.02	1.9206	0.01017
Q8_0_GXL	4.65	9.23	1.9245	0.01026
Q8_0_HXL	5.42	10.77	1.9241	0.01025
BF16	8.05	16.00	1.9233	0.01024
BF16_HXL	18.83	17.55	1.9231	0.01024
F32	16.10	32.00	1.9232	0.01024

Variant chooser, prefer FXL first

(these are my personal notes to help you choose)

Variant (preferred)	Size (GB)	Quality vs BF16	Inference speed	Long context headroom	My notes to you
IQ3_M_FXL	2.06	Low	Very fast	Excellent	I reach 76.33 tok/sec at 32k and 61.28 tok/sec at 64k. Use it when you must fit very tight limits.
IQ4_XS_FLX	2.73	Very good	Fast	Very good	I like this as a small yet stable 4-bit. If you need more raw speed, try IQ4_XS_GXL.
Q5_K_M_FXL	2.98	Very good	Medium fast	Very good	I use this when I want sturdier outputs than 4-bit with almost no size penalty.
Q6_K_FXL	3.40	Excellent	Medium fast	Very good	I lean on this for balanced quality, speed, and long contexts.
Q8_0_GXL	4.65	Excellent	Medium	OK	In my tests it kept high quality, 54.21 tok/sec at 16k and 52.04 at 32k.
BF16	8.05	Reference	Slow	Tight	I use BF16 when I want very high quality without going full F32.

Quick picks by GPU VRAM

(again, these are personal notes from my RTX 3060 12 GB with 48 GB RAM)

GPU VRAM	Pick	Why I recommend it
16 GB	Q6_K_GXL or Q8_0_GXL, consider BF16 for near best IQ	I get near-BF16 quality with room for long context or batching. BF16 fits but leaves less headroom.
12 GB	Q6_K_GXL for balance, or Q8_0_GXL for quality focus	On my 3060 12 GB these give strong quality and good 32k performance.
8 GB	IQ4_XS_GXL for speed, Q5_K_M_GXL for sturdier outputs	Both leave comfortable KV space for longer contexts in my runs.
6 GB	Q5_K_M_FXL or IQ4_XS_FLX	I find these the safest balance when memory is tight.
4 GB	IQ3_M_FXL first, IQ4_XS_FLX if your context still fits	I reach the best chance of running under strict limits with IQ3_M_FXL.

PROMPTS

Lesson

Create a C1 level dialogue for language learning. First, provide the introduction to the dialogue in English, then the dialogue in Italian. Then, an English translation of the dialogue. Next, there is a vocabulary list of all the Italian words used in the text, including their gender, class, and English meaning. Point out which words are more frequent, as they should be memorized for mastering that level. The grammar part should explain the grammar used in the text and present some grammar patterns that should be memorized as they are essential for mastering that level. Focus on explaining for students. Then, create a translation exercise section using sentences from the text for English to Italian translation.

Conversation Practice

I'd like to role-play. You'll act as an Argentine tourist asking me questions in simple Spanish about my city. Please keep the language at B1 level throughout.

You are "Ana", a Brazilian tourist. Always speak at B1 level, with sentences of up to 18 words, and simple punctuation. Do not invent facts about the user. If you are not sure, say "I do not know" and ask. Do not describe the weather, people’s clothing, or environmental details unless the user mentions them. Avoid ending the conversation with farewells unless the user ends it. Do not use em dashes; prefer commas, parentheses, or colons.

marcelone
/

Qwen3-4B-Instruct-2507-gguf

Master Table

Variant chooser, prefer FXL first

Quick picks by GPU VRAM

PROMPTS

Lesson

Conversation Practice

Model tree for marcelone/Qwen3-4B-Instruct-2507-gguf

Collections including marcelone/Qwen3-4B-Instruct-2507-gguf

Estudo de Idiomas - GGUF

Language Learning - GGUF