Update README.md
Browse files
README.md
CHANGED
@@ -6,31 +6,105 @@ tags:
|
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
|
|
|
|
|
|
|
|
9 |
licence: license
|
10 |
---
|
11 |
|
12 |
# Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
|
13 |
|
14 |
-
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Quick start
|
18 |
|
19 |
```python
|
20 |
-
from transformers import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
```
|
27 |
|
28 |
## Training procedure
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
|
31 |
|
|
|
|
|
|
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
### Framework versions
|
36 |
|
@@ -40,9 +114,11 @@ This model was trained with SFT.
|
|
40 |
- Datasets: 3.5.1
|
41 |
- Tokenizers: 0.21.1
|
42 |
|
43 |
-
##
|
44 |
|
|
|
45 |
|
|
|
46 |
|
47 |
Cite TRL as:
|
48 |
|
@@ -55,4 +131,16 @@ Cite TRL as:
|
|
55 |
publisher = {GitHub},
|
56 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
57 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
```
|
|
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
9 |
+
- math
|
10 |
+
- qlora
|
11 |
+
- gsm8k
|
12 |
+
- reasoning
|
13 |
licence: license
|
14 |
---
|
15 |
|
16 |
# Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
|
17 |
|
18 |
+
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.
|
19 |
+
|
20 |
+
## Performance
|
21 |
+
|
22 |
+
This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:
|
23 |
+
|
24 |
+
| Model | GSM8K Accuracy | Improvement |
|
25 |
+
|-------|----------------|-------------|
|
26 |
+
| Base Qwen3-0.6B | 20.17% | - |
|
27 |
+
| Fine-tuned Qwen3-0.6B | 43.06% | +113% |
|
28 |
+
|
29 |
+
Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.
|
30 |
|
31 |
## Quick start
|
32 |
|
33 |
```python
|
34 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
35 |
+
|
36 |
+
# Load the model and tokenizer
|
37 |
+
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
|
38 |
+
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
|
39 |
+
|
40 |
+
# Solve a math problem
|
41 |
+
question = "If 8x + 5 = 3x - 15, what is the value of x?"
|
42 |
+
messages = [
|
43 |
+
{"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
|
44 |
+
{"role": "user", "content": question}
|
45 |
+
]
|
46 |
+
|
47 |
+
# Format messages using the chat template
|
48 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
49 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
50 |
+
|
51 |
+
# Generate response
|
52 |
+
outputs = model.generate(
|
53 |
+
inputs["input_ids"],
|
54 |
+
max_new_tokens=512,
|
55 |
+
temperature=0.2
|
56 |
+
)
|
57 |
+
|
58 |
+
# Decode and print response
|
59 |
+
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
|
60 |
+
print(response)
|
61 |
+
```
|
62 |
+
|
63 |
+
### Example Output
|
64 |
|
65 |
+
```
|
66 |
+
To solve this equation, I need to isolate the variable x.
|
67 |
+
|
68 |
+
Given equation: 8x + 5 = 3x - 15
|
69 |
+
|
70 |
+
Step 1: Subtract 3x from both sides to get all x terms on the left side.
|
71 |
+
8x + 5 - 3x = 3x - 15 - 3x
|
72 |
+
5x + 5 = -15
|
73 |
+
|
74 |
+
Step 2: Subtract 5 from both sides.
|
75 |
+
5x + 5 - 5 = -15 - 5
|
76 |
+
5x = -20
|
77 |
+
|
78 |
+
Step 3: Divide both sides by 5 to isolate x.
|
79 |
+
5x/5 = -20/5
|
80 |
+
x = -4
|
81 |
+
|
82 |
+
Therefore, the value of x is -4.
|
83 |
```
|
84 |
|
85 |
## Training procedure
|
86 |
|
87 |
+
This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.
|
88 |
+
|
89 |
+
Training configuration:
|
90 |
+
- QLoRA with rank 16
|
91 |
+
- 1 epochs
|
92 |
+
- Learning rate: 2.0e-4
|
93 |
+
- Batch size: 8 (effective batch size with gradient accumulation: 16)
|
94 |
+
- BF16 precision
|
95 |
+
|
96 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
|
97 |
|
98 |
+
## Code and Reproducibility
|
99 |
+
|
100 |
+
The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)
|
101 |
|
102 |
+
The repository includes scripts for:
|
103 |
+
- Data preparation
|
104 |
+
- Training with QLoRA
|
105 |
+
- Merging weights
|
106 |
+
- Evaluation on math benchmarks
|
107 |
+
- Deployment with VLLM
|
108 |
|
109 |
### Framework versions
|
110 |
|
|
|
114 |
- Datasets: 3.5.1
|
115 |
- Tokenizers: 0.21.1
|
116 |
|
117 |
+
## Usage and Limitations
|
118 |
|
119 |
+
This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.
|
120 |
|
121 |
+
## Citations
|
122 |
|
123 |
Cite TRL as:
|
124 |
|
|
|
131 |
publisher = {GitHub},
|
132 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
133 |
}
|
134 |
+
```
|
135 |
+
|
136 |
+
If you use this model in your research, please cite:
|
137 |
+
|
138 |
+
```bibtex
|
139 |
+
@misc{qwen3-0.6B-math,
|
140 |
+
author = {Feng, Bo},
|
141 |
+
title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
|
142 |
+
year = {2025},
|
143 |
+
publisher = {GitHub},
|
144 |
+
howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
|
145 |
+
}
|
146 |
```
|