Qwen3-4B-Thinking-2507
Model Description
Qwen3-4B-Thinking-2507 is a 4-billion-parameter causal language model tuned for deep, structured reasoning. It runs exclusively in "thinking mode"—automatically revealing its chain-of-thought in outputs—without needing any special flags. The model supports a massive native context window of 262,144 tokens, great for multi-step logic, academic tasks, math, and code.
Features
- Explicit reasoning: Outputs contain intermediate steps, enclosed in thinking tags, to improve transparency and interpretability.
- Massive context: Handles up to 262,144 tokens natively.
- Advanced reasoning: Excels in logic, math, science, coding, and academic benchmarks.
- General capability uplift: Strengthened instruction following, tool use, text generation, and human preference alignment.
Use Cases
- Explaining complex problems with clear reasoning workflow
- Academic or STEM tutoring applications
- Code generation with logic transparency
- Agents that need to show how they think through a query
- Processing lengthy documents with deep inference
Inputs and Outputs
Input:
- Natural language problems, coding tasks, or academic questions that benefit from step-by-step decomposition.
Output:
- Structured chain-of-thought (with
<think>…</think>
tags), followed by final answer or solution.
- Note: The default template auto-inserts thinking behavior, so you may see only a closing
</think>
tag.
How to use
⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.
1) Install Nexa-SDK
2) Get an access token
Create a token in the Model Hub, then log in:
nexa config set license '<access_token>'
3) Run the model
Running:
nexa infer NexaAI/Qwen3-4B-Thinking-2507-npu
License
- Licensed under Apache-2.0
References