Qwen3-1.7B Mathematical Reasoning (SFT Stage 1)
Stage 1 SFT model trained following "A Practical Two-Stage Recipe for Mathematical LLMs" paper.
Training Details
- Base Model: unsloth/Qwen3-1.7B
- Training Method: Supervised Fine-Tuning (Stage 1)
- Dataset: RabotniKuma/Fast-Math-R1-SFT
- Epochs: 10
- System Prompt: "Please reason step by step, and put your final answer within \boxed{}"
Usage
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953",
max_seq_length=8192,
dtype=None,
load_in_4bit=True,
)
# Example
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."},
{"role": "user", "content": "What is 2+2?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)
Paper Citation
Based on the methodology from:
A Practical Two-Stage Recipe for Mathematical LLMs:
Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
This is Stage 1 - focused on maximizing accuracy. Stage 2 (GRPO) will optimize for token efficiency.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support