File size: 4,514 Bytes
c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b 2692980 c15d42b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: Qwen3-0.6B-math-orca-qlora-10k-ep1
tags:
- generated_from_trainer
- trl
- sft
- math
- qlora
- gsm8k
- reasoning
licence: license
---
# Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.
## Performance
This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:
| Model | GSM8K Accuracy | Improvement |
|-------|----------------|-------------|
| Base Qwen3-0.6B | 20.17% | - |
| Fine-tuned Qwen3-0.6B | 43.06% | +113% |
Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.
## Quick start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
{"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
{"role": "user", "content": question}
]
# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=512,
temperature=0.2
)
# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
### Example Output
```
To solve this equation, I need to isolate the variable x.
Given equation: 8x + 5 = 3x - 15
Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15
Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20
Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4
Therefore, the value of x is -4.
```
## Training procedure
This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.
Training configuration:
- QLoRA with rank 16
- 1 epochs
- Learning rate: 2.0e-4
- Batch size: 8 (effective batch size with gradient accumulation: 16)
- BF16 precision
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
## Code and Reproducibility
The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)
The repository includes scripts for:
- Data preparation
- Training with QLoRA
- Merging weights
- Evaluation on math benchmarks
- Deployment with VLLM
### Framework versions
- TRL: 0.18.0.dev0
- Transformers: 4.52.0.dev0
- Pytorch: 2.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1
## Usage and Limitations
This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```
If you use this model in your research, please cite:
```bibtex
@misc{qwen3-0.6B-math,
author = {Feng, Bo},
title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}
``` |