Qwen3-1.7B Mathematical Reasoning (SFT Stage 1)

Stage 1 SFT model trained following "A Practical Two-Stage Recipe for Mathematical LLMs" paper.

Training Details

Base Model: unsloth/Qwen3-1.7B
Training Method: Supervised Fine-Tuning (Stage 1)
Dataset: RabotniKuma/Fast-Math-R1-SFT
Epochs: 10
System Prompt: "Please reason step by step, and put your final answer within \boxed{}"

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953",
    max_seq_length=8192,
    dtype=None,
    load_in_4bit=True,
)

# Example
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."},
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)

Paper Citation

Based on the methodology from:

A Practical Two-Stage Recipe for Mathematical LLMs:
Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

This is Stage 1 - focused on maximizing accuracy. Stage 2 (GRPO) will optimize for token efficiency.

Cbgcbg
/

qwen3-1.7b-math-sft-stage1-20250723_111953

Qwen3-1.7B Mathematical Reasoning (SFT Stage 1)

Training Details

Usage

Paper Citation

Model tree for Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953