ZTE-AIM/NTele-R1-32B-DS · Hugging Face

Achieving Superior Performance over QwQ-32B Using Only 965 Strategically Curated Samples

Model description

Most existing mthods focused on distilling DeepSeek-R1 to improve reasoning ability. However, as far as we know, there is no distilled model could surpass DeepSeek-R1 or QwQ-32B. We introduce NTele-R1-32B-DS , a state-of-the-art mathematical reasoning model that outperforms QwQ-32B across common reasoning benchmarks, including AIME2024/2025, MATH500 and GPQA-Diamond. Notely, NTele-R1-32B-DS is the first that achieves more than 80/70 in challenging AIME2024/2025.

Model	Trained From	Release Date	AIME2024	AIME2025	MATH500	GPQA-Diamond
QwQ-32B	-	25.3.6	76.25	67.30	94.6	63.6
DeepSeek-32B-Distill	Qwen2.5-32B-Instruct	25.1.20	64.17	55.21	89.8	62.1
Light-R1-32B-DS	DeepSeek-R1-Distill-Qwen-32B	25.3.12	74.79	68.54	92	69.19
AReal-boba-SFT-32B	DeepSeek-R1-Distill-Qwen-32B	25.3.30	70.63	63.54	88.8	64.65
NTele-R1-32B-DS(ours)	DeepSeek-R1-Distill-Qwen-32B	25.4.17	80.42	73.54	95.4	66.16

Data Curation

We start from the S1 dataset and conduct the following procedures:

QwQ-32B as a Better Teacher :

We find that QwQ-32B, with its smoother flow in CoT reasoning, serves as a better teacher compared to DeepSeek-R1. For each question in S1 dataset, we sampled 50 responses from QwQ-32B.

Focusing on Harder Questions :

We evaluated the correctness of the responses for each question. After that, we filtered out the easier questions with a pass rate exceeding 0.6.

Diverse Reasoning Paths Break the Limitation of Distillation :

To maximize the diversity of reasoning paths, we calculated the Levenshtein distance between all answers for each question. For every question, we selected up to 5 answers for each question with the greatest distances, resulting in the final dataset with 965 samples.

You can access our dataset to get 965 training data

Evaluation

We evaluate models with SkyThought.

Training Details

NTele-R1-32B-DS was trained from DeepSeek-32B-Distill on 8xH800.

Training hyperparameter

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 6
total_train_batch_size: 48
total_eval_batch_size: 48
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0