Qwen3-1.7B Mathematical Reasoning (SFT Stage 1)

Stage 1 SFT model trained following "A Practical Two-Stage Recipe for Mathematical LLMs" paper.

Training Details

  • Base Model: unsloth/Qwen3-1.7B
  • Training Method: Supervised Fine-Tuning (Stage 1)
  • Dataset: RabotniKuma/Fast-Math-R1-SFT
  • Epochs: 10
  • System Prompt: "Please reason step by step, and put your final answer within \boxed{}"

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953",
    max_seq_length=8192,
    dtype=None,
    load_in_4bit=True,
)

# Example
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."},
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)

Paper Citation

Based on the methodology from:

A Practical Two-Stage Recipe for Mathematical LLMs:
Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

This is Stage 1 - focused on maximizing accuracy. Stage 2 (GRPO) will optimize for token efficiency.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953

Finetuned
Qwen/Qwen3-1.7B
Finetuned
unsloth/Qwen3-1.7B
Finetuned
(86)
this model