Qwen3-4B-Thinking-2507

Model Description

Qwen3-4B-Thinking-2507 is a 4-billion-parameter causal language model tuned for deep, structured reasoning. It runs exclusively in "thinking mode"—automatically revealing its chain-of-thought in outputs—without needing any special flags. The model supports a massive native context window of 262,144 tokens, great for multi-step logic, academic tasks, math, and code.

Features

Explicit reasoning: Outputs contain intermediate steps, enclosed in thinking tags, to improve transparency and interpretability.
Massive context: Handles up to 262,144 tokens natively.
Advanced reasoning: Excels in logic, math, science, coding, and academic benchmarks.
General capability uplift: Strengthened instruction following, tool use, text generation, and human preference alignment.

Use Cases

Explaining complex problems with clear reasoning workflow
Academic or STEM tutoring applications
Code generation with logic transparency
Agents that need to show how they think through a query
Processing lengthy documents with deep inference

Inputs and Outputs

Input:

Natural language problems, coding tasks, or academic questions that benefit from step-by-step decomposition.

Output:

Structured chain-of-thought (with <think>…</think> tags), followed by final answer or solution.
Note: The default template auto-inserts thinking behavior, so you may see only a closing </think> tag.

How to use

⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.

1) Install Nexa-SDK

Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
(Other platforms coming soon)

2) Get an access token

Create a token in the Model Hub, then log in:

nexa config set license '<access_token>'

3) Run the model

Running:

nexa infer NexaAI/Qwen3-4B-Thinking-2507-npu

License

Licensed under Apache-2.0

References

Model card on Hugging Face: Qwen3-4B-Thinking-2507

NexaAI
/

Qwen3-4B-Thinking-2507-npu