Upload folder using huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- unsloth
|
4 |
+
base_model:
|
5 |
+
- Qwen/Qwen3-4B-Instruct-2507
|
6 |
+
library_name: transformers
|
7 |
+
license: apache-2.0
|
8 |
+
license_link: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
---
|
11 |
+
> [!NOTE]
|
12 |
+
> Includes Unsloth **chat template fixes**! <br> For `llama.cpp`, use `--jinja`
|
13 |
+
>
|
14 |
+
|
15 |
+
<div>
|
16 |
+
<p style="margin-top: 0;margin-bottom: 0;">
|
17 |
+
<em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
|
18 |
+
</p>
|
19 |
+
<div style="display: flex; gap: 5px; align-items: center; ">
|
20 |
+
<a href="https://github.com/unslothai/unsloth/">
|
21 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
|
22 |
+
</a>
|
23 |
+
<a href="https://discord.gg/unsloth">
|
24 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
|
25 |
+
</a>
|
26 |
+
<a href="https://docs.unsloth.ai/">
|
27 |
+
<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
|
28 |
+
</a>
|
29 |
+
</div>
|
30 |
+
</div>
|
31 |
+
|
32 |
+
|
33 |
+
# Qwen3-4B-Instruct-2507
|
34 |
+
<a href="https://chat.qwen.ai" target="_blank" style="margin: 2px;">
|
35 |
+
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
36 |
+
</a>
|
37 |
+
|
38 |
+
## Highlights
|
39 |
+
|
40 |
+
We introduce the updated version of the **Qwen3-4B non-thinking mode**, named **Qwen3-4B-Instruct-2507**, featuring the following key enhancements:
|
41 |
+
|
42 |
+
- **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.
|
43 |
+
- **Substantial gains** in long-tail knowledge coverage across **multiple languages**.
|
44 |
+
- **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.
|
45 |
+
- **Enhanced capabilities** in **256K long-context understanding**.
|
46 |
+
|
47 |
+

|
48 |
+
|
49 |
+
## Model Overview
|
50 |
+
|
51 |
+
**Qwen3-4B-Instruct-2507** has the following features:
|
52 |
+
- Type: Causal Language Models
|
53 |
+
- Training Stage: Pretraining & Post-training
|
54 |
+
- Number of Parameters: 4.0B
|
55 |
+
- Number of Paramaters (Non-Embedding): 3.6B
|
56 |
+
- Number of Layers: 36
|
57 |
+
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
|
58 |
+
- Context Length: **262,144 natively**.
|
59 |
+
|
60 |
+
**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
|
61 |
+
|
62 |
+
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
63 |
+
|
64 |
+
|
65 |
+
## Performance
|
66 |
+
|
67 |
+
| | GPT-4.1-nano-2025-04-14 | Qwen3-30B-A3B Non-Thinking | Qwen3-4B Non-Thinking | Qwen3-4B-Instruct-2507 |
|
68 |
+
|--- | --- | --- | --- | --- |
|
69 |
+
| **Knowledge** | | | |
|
70 |
+
| MMLU-Pro | 62.8 | 69.1 | 58.0 | **69.6** |
|
71 |
+
| MMLU-Redux | 80.2 | 84.1 | 77.3 | **84.2** |
|
72 |
+
| GPQA | 50.3 | 54.8 | 41.7 | **62.0** |
|
73 |
+
| SuperGPQA | 32.2 | 42.2 | 32.0 | **42.8** |
|
74 |
+
| **Reasoning** | | | |
|
75 |
+
| AIME25 | 22.7 | 21.6 | 19.1 | **47.4** |
|
76 |
+
| HMMT25 | 9.7 | 12.0 | 12.1 | **31.0** |
|
77 |
+
| ZebraLogic | 14.8 | 33.2 | 35.2 | **80.2** |
|
78 |
+
| LiveBench 20241125 | 41.5 | 59.4 | 48.4 | **63.0** |
|
79 |
+
| **Coding** | | | |
|
80 |
+
| LiveCodeBench v6 (25.02-25.05) | 31.5 | 29.0 | 26.4 | **35.1** |
|
81 |
+
| MultiPL-E | 76.3 | 74.6 | 66.6 | **76.8** |
|
82 |
+
| Aider-Polyglot | 9.8 | **24.4** | 13.8 | 12.9 |
|
83 |
+
| **Alignment** | | | |
|
84 |
+
| IFEval | 74.5 | **83.7** | 81.2 | 83.4 |
|
85 |
+
| Arena-Hard v2* | 15.9 | 24.8 | 9.5 | **43.4** |
|
86 |
+
| Creative Writing v3 | 72.7 | 68.1 | 53.6 | **83.5** |
|
87 |
+
| WritingBench | 66.9 | 72.2 | 68.5 | **83.4** |
|
88 |
+
| **Agent** | | | |
|
89 |
+
| BFCL-v3 | 53.0 | 58.6 | 57.6 | **61.9** |
|
90 |
+
| TAU1-Retail | 23.5 | 38.3 | 24.3 | **48.7** |
|
91 |
+
| TAU1-Airline | 14.0 | 18.0 | 16.0 | **32.0** |
|
92 |
+
| TAU2-Retail | - | 31.6 | 28.1 | **40.4** |
|
93 |
+
| TAU2-Airline | - | 18.0 | 12.0 | **24.0** |
|
94 |
+
| TAU2-Telecom | - | **18.4** | 17.5 | 13.2 |
|
95 |
+
| **Multilingualism** | | | |
|
96 |
+
| MultiIF | 60.7 | **70.8** | 61.3 | 69.0 |
|
97 |
+
| MMLU-ProX | 56.2 | **65.1** | 49.6 | 61.6 |
|
98 |
+
| INCLUDE | 58.6 | **67.8** | 53.8 | 60.1 |
|
99 |
+
| PolyMATH | 15.6 | 23.3 | 16.6 | **31.1** |
|
100 |
+
|
101 |
+
*: For reproducibility, we report the win rates evaluated by GPT-4.1.
|
102 |
+
|
103 |
+
|
104 |
+
## Quickstart
|
105 |
+
|
106 |
+
The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
|
107 |
+
|
108 |
+
With `transformers<4.51.0`, you will encounter the following error:
|
109 |
+
```
|
110 |
+
KeyError: 'qwen3'
|
111 |
+
```
|
112 |
+
|
113 |
+
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
|
114 |
+
```python
|
115 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
116 |
+
|
117 |
+
model_name = "Qwen/Qwen3-4B-Instruct-2507"
|
118 |
+
|
119 |
+
# load the tokenizer and the model
|
120 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
121 |
+
model = AutoModelForCausalLM.from_pretrained(
|
122 |
+
model_name,
|
123 |
+
torch_dtype="auto",
|
124 |
+
device_map="auto"
|
125 |
+
)
|
126 |
+
|
127 |
+
# prepare the model input
|
128 |
+
prompt = "Give me a short introduction to large language model."
|
129 |
+
messages = [
|
130 |
+
{"role": "user", "content": prompt}
|
131 |
+
]
|
132 |
+
text = tokenizer.apply_chat_template(
|
133 |
+
messages,
|
134 |
+
tokenize=False,
|
135 |
+
add_generation_prompt=True,
|
136 |
+
)
|
137 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
138 |
+
|
139 |
+
# conduct text completion
|
140 |
+
generated_ids = model.generate(
|
141 |
+
**model_inputs,
|
142 |
+
max_new_tokens=16384
|
143 |
+
)
|
144 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
145 |
+
|
146 |
+
content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
147 |
+
|
148 |
+
print("content:", content)
|
149 |
+
```
|
150 |
+
|
151 |
+
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
|
152 |
+
- SGLang:
|
153 |
+
```shell
|
154 |
+
python -m sglang.launch_server --model-path Qwen/Qwen3-4B-Instruct-2507 --context-length 262144
|
155 |
+
```
|
156 |
+
- vLLM:
|
157 |
+
```shell
|
158 |
+
vllm serve Qwen/Qwen3-4B-Instruct-2507 --max-model-len 262144
|
159 |
+
```
|
160 |
+
|
161 |
+
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
162 |
+
|
163 |
+
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
164 |
+
|
165 |
+
## Agentic Use
|
166 |
+
|
167 |
+
Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
|
168 |
+
|
169 |
+
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
|
170 |
+
```python
|
171 |
+
from qwen_agent.agents import Assistant
|
172 |
+
|
173 |
+
# Define LLM
|
174 |
+
llm_cfg = {
|
175 |
+
'model': 'Qwen3-4B-Instruct-2507',
|
176 |
+
|
177 |
+
# Use a custom endpoint compatible with OpenAI API:
|
178 |
+
'model_server': 'http://localhost:8000/v1', # api_base
|
179 |
+
'api_key': 'EMPTY',
|
180 |
+
}
|
181 |
+
|
182 |
+
# Define Tools
|
183 |
+
tools = [
|
184 |
+
{'mcpServers': { # You can specify the MCP configuration file
|
185 |
+
'time': {
|
186 |
+
'command': 'uvx',
|
187 |
+
'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
|
188 |
+
},
|
189 |
+
"fetch": {
|
190 |
+
"command": "uvx",
|
191 |
+
"args": ["mcp-server-fetch"]
|
192 |
+
}
|
193 |
+
}
|
194 |
+
},
|
195 |
+
'code_interpreter', # Built-in tools
|
196 |
+
]
|
197 |
+
|
198 |
+
# Define Agent
|
199 |
+
bot = Assistant(llm=llm_cfg, function_list=tools)
|
200 |
+
|
201 |
+
# Streaming generation
|
202 |
+
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
|
203 |
+
for responses in bot.run(messages=messages):
|
204 |
+
pass
|
205 |
+
print(responses)
|
206 |
+
```
|
207 |
+
|
208 |
+
## Best Practices
|
209 |
+
|
210 |
+
To achieve optimal performance, we recommend the following settings:
|
211 |
+
|
212 |
+
1. **Sampling Parameters**:
|
213 |
+
- We suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
|
214 |
+
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
|
215 |
+
|
216 |
+
2. **Adequate Output Length**: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
|
217 |
+
|
218 |
+
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
|
219 |
+
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
|
220 |
+
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
|
221 |
+
|
222 |
+
### Citation
|
223 |
+
|
224 |
+
If you find our work helpful, feel free to give us a cite.
|
225 |
+
|
226 |
+
```
|
227 |
+
@misc{qwen3technicalreport,
|
228 |
+
title={Qwen3 Technical Report},
|
229 |
+
author={Qwen Team},
|
230 |
+
year={2025},
|
231 |
+
eprint={2505.09388},
|
232 |
+
archivePrefix={arXiv},
|
233 |
+
primaryClass={cs.CL},
|
234 |
+
url={https://arxiv.org/abs/2505.09388},
|
235 |
+
}
|
236 |
+
```
|