File size: 3,630 Bytes
67ff38a a6eab6c a325535 9b1b5d2 6e5607e 53ee55a 67ff38a 5514a74 67ff38a 5514a74 67ff38a fb9163c 38ae50e fb9163c 67ff38a 011743e 67ff38a f335f91 67ff38a b71f3e0 67ff38a f7d6bfa d29e0f4 ab33ae2 d29e0f4 f7d6bfa ccbda51 d29e0f4 ab33ae2 d29e0f4 7b48509 67ff38a 434f731 fb9163c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
datasets:
- PowerInfer/QWQ-LONGCOT-500K
- PowerInfer/LONGCOT-Refine-500K
base_model:
- Qwen/Qwen2.5-3B-Instruct
pipeline_tag: text-generation
language:
- en
library_name: transformers
---
# SmallThinker-3B-preview
We introduce **SmallThinker-3B-preview**, a new model fine-tuned from the [Qwen2.5-3b-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model.
## Benchmark Performance
| Model | AIME24 | AMC23 | GAOKAO2024_I | GAOKAO2024_II | MMLU_STEM | AMPS_Hard | math_comp |
|---------|--------|-------|--------------|---------------|-----------|-----------|-----------|
| Qwen2.5-3B-Instruct | 6.67 | 45 | 50 | 35.8 | 59.8 | - | - |
| SmallThinker | 16.667 | 57.5 | 64.2 | 57.1 | 68.2 | 70 | 46.8 |
| GPT-4o | 9.3 | - | - | - | 64.2 | 57 | 50 |
Limitation: Due to SmallThinker's current limitations in instruction following, for math_comp we adopt a more lenient evaluation method where only correct answers are required, without constraining responses to follow the specified AAAAA format.
Colab Link: [Colab](https://colab.research.google.com/drive/182q600at0sVw7uX0SXFp6bQI7pyjWXQ2?usp=sharing)
## Intended Use Cases
SmallThinker is designed for the following use cases:
1. **Edge Deployment:** Its small size makes it ideal for deployment on resource-constrained devices.
2. **Draft Model for QwQ-32B-Preview:** SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model. From my test, in llama.cpp we can get 70% speedup (from 40 tokens/s to 70 tokens/s).
## Training Details
The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
The SFT (Supervised Fine-Tuning) process was conducted in two phases:
1. First Phase:
- Used only the PowerInfer/QWQ-LONGCOT-500K dataset
- Trained for 1.5 epochs
```
### model
model_name_or_path: /home/syx/Qwen2.5-3B-Instruct
### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: o1-v2
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 1000
plot_loss: true
overwrite_output_dir: true
```
2. Second Phase:
- Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
- Continued training for 2 additional epochs
```
### model
model_name_or_path: saves/qwen2-01-qat/full/sft/checkpoint-24000
### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: o1-v2, o1-v3
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 1000
plot_loss: true
overwrite_output_dir: true
```
## Limitations & Disclaimer
Please be aware of the following limitations:
* **Language Limitation:** The model has only been trained on English-language datasets, hence its capabilities in other languages are still lacking.
* **Limited Knowledge:** Due to limited SFT data and the model's relatively small scale, its reasoning capabilities are constrained by its knowledge base.
* **Unpredictable Outputs:** The model may produce unexpected outputs due to its size and probabilistic generation paradigm. Users should exercise caution and validate the model's responses.
* **Repetition Issue:** The model tends to repeat itself when answering high-difficulty questions. Please increase the `repetition_penalty` to mitigate this issue. |