license: apache-2.0
base_model:
- Qwen/Qwen3-4B-Thinking-2507
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation-inference
- thinking
- llama.cpp
- f32
Qwen3-4B-Thinking-2507-GGUF
Qwen3-4B-Thinking-2507 is a 4-billion-parameter causal language model specialized for advanced thinking and reasoning tasks, featuring significant improvements in logical reasoning, mathematics, science, coding, and academic benchmarks typically requiring human expertise. It supports an extremely long native context length of 262,144 tokens, enabling deep and complex problem-solving. Unlike prior versions, this model operates solely in "thinking mode," automatically generating intermediate reasoning steps enclosed in special tags to enhance transparency and interpretability of its outputs. It has 36 layers, uses a GQA attention mechanism with 32 query heads and 8 key-value heads, and benefits from pretraining and post-training stages to optimize both quality and depth of reasoning.
Qwen3-4B-Thinking-2507 also improves general capabilities such as instruction following, tool usage, text generation, and alignment with human preferences, making it suitable for highly complex reasoning tasks in scientific, coding, and academic domains. The model is fully integrated with the Hugging Face transformers library and can be deployed with extended context support via toolkits like vLLM and SGLang. Detailed benchmarks, deployment guidelines, and usage examples are available in its official repository, making it an excellent choice for research and applications requiring transparent, scalable AI reasoning.
Model Files
File Name | Size | Quant Type |
---|---|---|
Qwen3-4B-Thinking-2507.BF16.gguf | 8.05 GB | BF16 |
Qwen3-4B-Thinking-2507.F16.gguf | 8.05 GB | F16 |
Qwen3-4B-Thinking-2507.F32.gguf | 16.1 GB | F32 |
Qwen3-4B-Thinking-2507.Q2_K.gguf | 1.67 GB | Q2_K |
Qwen3-4B-Thinking-2507.Q3_K_L.gguf | 2.24 GB | Q3_K_L |
Qwen3-4B-Thinking-2507.Q3_K_M.gguf | 2.08 GB | Q3_K_M |
Qwen3-4B-Thinking-2507.Q3_K_S.gguf | 1.89 GB | Q3_K_S |
Qwen3-4B-Thinking-2507.Q4_K_M.gguf | 2.5 GB | Q4_K_M |
Qwen3-4B-Thinking-2507.Q4_K_S.gguf | 2.38 GB | Q4_K_S |
Qwen3-4B-Thinking-2507.Q5_K_M.gguf | 2.89 GB | Q5_K_M |
Qwen3-4B-Thinking-2507.Q5_K_S.gguf | 2.82 GB | Q5_K_S |
Qwen3-4B-Thinking-2507.Q6_K.gguf | 3.31 GB | Q6_K |
Qwen3-4B-Thinking-2507.Q8_0.gguf | 4.28 GB | Q8_0 |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):