---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: Qwen3-0.6B-math-orca-qlora-10k-ep1
tags:
- generated_from_trainer
- trl
- sft
- math
- qlora
- gsm8k
- reasoning
licence: license
---

# Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1

This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.

## Performance

This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:

| Model | GSM8K Accuracy | Improvement |
|-------|----------------|-------------|
| Base Qwen3-0.6B | 20.17% | - |
| Fine-tuned Qwen3-0.6B | 43.06% | +113% |

Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)

# Solve a math problem
question = "If 8x + 5 = 3x - 15, what is the value of x?"
messages = [
    {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
    {"role": "user", "content": question}
]

# Format messages using the chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.2
)

# Decode and print response
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

### Example Output

```
To solve this equation, I need to isolate the variable x.

Given equation: 8x + 5 = 3x - 15

Step 1: Subtract 3x from both sides to get all x terms on the left side.
8x + 5 - 3x = 3x - 15 - 3x
5x + 5 = -15

Step 2: Subtract 5 from both sides.
5x + 5 - 5 = -15 - 5
5x = -20

Step 3: Divide both sides by 5 to isolate x.
5x/5 = -20/5
x = -4

Therefore, the value of x is -4.
```

## Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.

Training configuration:
- QLoRA with rank 16
- 1 epochs
- Learning rate: 2.0e-4
- Batch size: 8 (effective batch size with gradient accumulation: 16)
- BF16 precision

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p) 

## Code and Reproducibility

The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)

The repository includes scripts for:
- Data preparation
- Training with QLoRA
- Merging weights
- Evaluation on math benchmarks
- Deployment with VLLM

### Framework versions

- TRL: 0.18.0.dev0
- Transformers: 4.52.0.dev0
- Pytorch: 2.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1

## Usage and Limitations

This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.

## Citations

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```

If you use this model in your research, please cite:

```bibtex
@misc{qwen3-0.6B-math,
    author = {Feng, Bo},
    title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
    year = {2025},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
}
```