Achieving Superior Performance over QwQ-32B Using Only 965 Strategically Curated Samples
Model description
Most existing mthods focused on distilling DeepSeek-R1 to improve reasoning ability. However, as far as we know, there is no distilled model could surpass DeepSeek-R1 or QwQ-32B. We introduce NTele-R1-32B-DS , a state-of-the-art mathematical reasoning model that outperforms QwQ-32B across common reasoning benchmarks, including AIME2024/2025, MATH500 and GPQA-Diamond. Notely, NTele-R1-32B-DS is the first that achieves more than 80/70 in challenging AIME2024/2025.
Model | Trained From | Release Date | AIME2024 | AIME2025 | MATH500 | GPQA-Diamond |
---|---|---|---|---|---|---|
QwQ-32B | - | 25.3.6 | 76.25 | 67.30 | 94.6 | 63.6 |
DeepSeek-32B-Distill | Qwen2.5-32B-Instruct | 25.1.20 | 64.17 | 55.21 | 89.8 | 62.1 |
Light-R1-32B-DS | DeepSeek-R1-Distill-Qwen-32B | 25.3.12 | 74.79 | 68.54 | 92 | 69.19 |
AReal-boba-SFT-32B | DeepSeek-R1-Distill-Qwen-32B | 25.3.30 | 70.63 | 63.54 | 88.8 | 64.65 |
NTele-R1-32B-DS(ours) | DeepSeek-R1-Distill-Qwen-32B | 25.4.17 | 80.42 | 73.54 | 95.4 | 66.16 |
Data Curation
We start from the S1 dataset and conduct the following procedures:
- QwQ-32B as a Better Teacher :
- We find that QwQ-32B, with its smoother flow in CoT reasoning, serves as a better teacher compared to DeepSeek-R1. For each question in S1 dataset, we sampled 50 responses from QwQ-32B.
- Focusing on Harder Questions :
- We evaluated the correctness of the responses for each question. After that, we filtered out the easier questions with a pass rate exceeding 0.6.
- Diverse Reasoning Paths Break the Limitation of Distillation :
- To maximize the diversity of reasoning paths, we calculated the Levenshtein distance between all answers for each question. For every question, we selected up to 5 answers for each question with the greatest distances, resulting in the final dataset with 965 samples.
You can access our dataset to get 965 training data
Evaluation
We evaluate models with SkyThought.
Training Details
NTele-R1-32B-DS was trained from DeepSeek-32B-Distill on 8xH800.
Training hyperparameter
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 6
- total_train_batch_size: 48
- total_eval_batch_size: 48
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.