metadata

license: apache-2.0
base_model:
  - Qwen/Qwen3-4B-Thinking-2507
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - text-generation-inference
  - thinking
  - llama.cpp
  - f32

Qwen3-4B-Thinking-2507-GGUF

Qwen3-4B-Thinking-2507 is a 4-billion-parameter causal language model specialized for advanced thinking and reasoning tasks, featuring significant improvements in logical reasoning, mathematics, science, coding, and academic benchmarks typically requiring human expertise. It supports an extremely long native context length of 262,144 tokens, enabling deep and complex problem-solving. Unlike prior versions, this model operates solely in "thinking mode," automatically generating intermediate reasoning steps enclosed in special tags to enhance transparency and interpretability of its outputs. It has 36 layers, uses a GQA attention mechanism with 32 query heads and 8 key-value heads, and benefits from pretraining and post-training stages to optimize both quality and depth of reasoning.

Qwen3-4B-Thinking-2507 also improves general capabilities such as instruction following, tool usage, text generation, and alignment with human preferences, making it suitable for highly complex reasoning tasks in scientific, coding, and academic domains. The model is fully integrated with the Hugging Face transformers library and can be deployed with extended context support via toolkits like vLLM and SGLang. Detailed benchmarks, deployment guidelines, and usage examples are available in its official repository, making it an excellent choice for research and applications requiring transparent, scalable AI reasoning.

Model Files

File Name	Size	Quant Type
Qwen3-4B-Thinking-2507.BF16.gguf	8.05 GB	BF16
Qwen3-4B-Thinking-2507.F16.gguf	8.05 GB	F16
Qwen3-4B-Thinking-2507.F32.gguf	16.1 GB	F32
Qwen3-4B-Thinking-2507.Q2_K.gguf	1.67 GB	Q2_K
Qwen3-4B-Thinking-2507.Q3_K_L.gguf	2.24 GB	Q3_K_L
Qwen3-4B-Thinking-2507.Q3_K_M.gguf	2.08 GB	Q3_K_M
Qwen3-4B-Thinking-2507.Q3_K_S.gguf	1.89 GB	Q3_K_S
Qwen3-4B-Thinking-2507.Q4_K_M.gguf	2.5 GB	Q4_K_M
Qwen3-4B-Thinking-2507.Q4_K_S.gguf	2.38 GB	Q4_K_S
Qwen3-4B-Thinking-2507.Q5_K_M.gguf	2.89 GB	Q5_K_M
Qwen3-4B-Thinking-2507.Q5_K_S.gguf	2.82 GB	Q5_K_S
Qwen3-4B-Thinking-2507.Q6_K.gguf	3.31 GB	Q6_K
Qwen3-4B-Thinking-2507.Q8_0.gguf	4.28 GB	Q8_0

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):