---
license: apache-2.0
base_model: unsloth/Qwen3-1.7B
tags:
- unsloth
- qwen3
- mathematical-reasoning
- sft
- stage1
language:
- en
pipeline_tag: text-generation
library_name: transformers
---

# Qwen3-1.7B Mathematical Reasoning (SFT Stage 1)

**Stage 1 SFT model** trained following "A Practical Two-Stage Recipe for Mathematical LLMs" paper.

## Training Details

- **Base Model**: unsloth/Qwen3-1.7B
- **Training Method**: Supervised Fine-Tuning (Stage 1)
- **Dataset**: RabotniKuma/Fast-Math-R1-SFT
- **Epochs**: 10
- **System Prompt**: "Please reason step by step, and put your final answer within \boxed{}"

## Usage

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953",
    max_seq_length=8192,
    dtype=None,
    load_in_4bit=True,
)

# Example
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."},
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)
```

## Paper Citation

Based on the methodology from:
```
A Practical Two-Stage Recipe for Mathematical LLMs:
Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
```

This is **Stage 1** - focused on maximizing accuracy. Stage 2 (GRPO) will optimize for token efficiency.