File size: 4,514 Bytes
c15d42b
 
 
 
 
 
 
 
2692980
 
 
 
c15d42b
 
 
 
 
2692980
 
 
 
 
 
 
 
 
 
 
 
c15d42b
 
 
 
2692980
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c15d42b
2692980
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c15d42b
 
 
 
2692980
 
 
 
 
 
 
 
 
c15d42b
 
2692980
 
 
c15d42b
2692980
 
 
 
 
 
c15d42b
 
 
 
 
 
 
 
 
2692980
c15d42b
2692980
c15d42b
2692980
c15d42b
 
 
 
 
 
 
 
 
 
 
 
2692980
 
 
 
 
 
 
 
 
 
 
 
c15d42b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: Qwen3-0.6B-math-orca-qlora-10k-ep1
tags:
- generated_from_trainer
- trl
- sft
- math
- qlora
- gsm8k
- reasoning
licence: license
---

# Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1

This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.

## Performance

This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:

| Model | GSM8K Accuracy | Improvement |
|-------|----------------|-------------|
| Base Qwen3-0.6B | 20.17% | - |
| Fine-tuned Qwen3-0.6B | 43.06% | +113% |

Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)

# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
    {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
    {"role": "user", "content": question}
]

# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.2
)

# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

### Example Output

```
To solve this equation, I need to isolate the variable x.

Given equation: 8x + 5 = 3x - 15

Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15

Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20

Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4

Therefore, the value of x is -4.
```

## Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.

Training configuration:
- QLoRA with rank 16
- 1 epochs
- Learning rate: 2.0e-4
- Batch size: 8 (effective batch size with gradient accumulation: 16)
- BF16 precision

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p) 

## Code and Reproducibility

The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)

The repository includes scripts for:
- Data preparation
- Training with QLoRA
- Merging weights
- Evaluation on math benchmarks
- Deployment with VLLM

### Framework versions

- TRL: 0.18.0.dev0
- Transformers: 4.52.0.dev0
- Pytorch: 2.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1

## Usage and Limitations

This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.

## Citations

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```

If you use this model in your research, please cite:

```bibtex
@misc{qwen3-0.6B-math,
    author = {Feng, Bo},
    title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
    year = {2025},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}
```