Overview
QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
Data
Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
- AIME Problems(QwQ-32B Generated)
- AMC Problems(QwQ-32B Generated)
- MMLU Problems(QwQ-32B Generated))
- Complementary academic math and reasoning datasets(QwQ-32B Generated)
By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
Generate in QwQ32B with reference to each dataset in the model definition and other datasets.
Training Recipe
To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
- Scaled Group Relative Policy Optimization (GRPO): An adaptation of PPO that normalizes the advantage function across samples generated from the same prompt. - KL Divergence Regularization:: Additional regularization is applied on top of the surrogate loss to prevent significant policy drift. - Iterative Context Scaling:: Progressive expansion of the context length is used to boost model performance while reducing compute costs.
Training was carried out using H200 GPUs for 336 hours at an exceptionally low cost of approximately $1,341. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
Evaluation
The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
Benchmark | Metric (Path@1) | Metric (cons@64) | Avg. Token Count |
---|---|---|---|
MMLU | 47.18 | – | – |
AIME 2024 | 33.33 | 53.33 | 21,191 |
AIME 2025-I | 34.58 | 40.00 | 17,952 |
AIME 2025-II | 21.56 | 33.33 | 21,376 |
AMC 2023 | 75.00 | 58.92 | 44.17 |
MATH 5000 | 38.89 | – | 20,173 |
Comparison
Serving QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha
Deploy your model effortlessly using high-performance inference systems, including:
- vLLM
- Hugging Face Text Generation Inference (TGI)
- SGLang
- TensorRT-LLM
All these systems support the OpenAI Chat Completions API format, ensuring smooth integration into your applications.
How to use:
Runs on a single A40 GPU!
Serving Model:
vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
Call API Without Streaming:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123",
)
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop."""
completion = client.chat.completions.create(
model="AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha",
messages=[
{"role": "user", "content": prompt}
]
)
print(completion.choices[0].message)
Call API With Streaming:
from openai import OpenAI
#Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop."""
messages = [{"role": "user", "content": prompt}]
#For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
stream = client.chat.completions.create(model=model,
messages=messages,
stream=True)
print("client: Start streaming chat completions...")
printed_reasoning_content = False
printed_content = False
for chunk in stream:
reasoning_content = None
content = None
# Check the content is reasoning_content or content
if hasattr(chunk.choices[0].delta, "reasoning_content"):
reasoning_content = chunk.choices[0].delta.reasoning_content
elif hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if reasoning_content is not None:
if not printed_reasoning_content:
printed_reasoning_content = True
print("reasoning_content:", end="", flush=True)
print(reasoning_content, end="", flush=True)
elif content is not None:
if not printed_content:
printed_content = True
print("\ncontent:", end="", flush=True)
# Extract and print the content
print(content, end="", flush=True)
License
This project is released under the MIT License, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
Special Thanks
We extend our sincere gratitude to the following teams and organizations whose contributions and ideas were instrumental in this project:
- Qwen Team (Alibaba Cloud): for creating the exceptional QwQ-32B model used as the distillation source.
- Agentica-org (Berkeley Sky Computing Lab and Berkeley AI Research): for valuable insights and pioneering reinforcement learning techniques.
- DeepSeek AI: for developing the robust foundational model upon which this research is built.
Their groundbreaking work made our innovations possible.
- Downloads last month
- 51
Model tree for AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B