--- license: apache-2.0 base_model: unsloth/Qwen3-1.7B tags: - unsloth - qwen3 - mathematical-reasoning - sft - stage1 language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen3-1.7B Mathematical Reasoning (SFT Stage 1) **Stage 1 SFT model** trained following "A Practical Two-Stage Recipe for Mathematical LLMs" paper. ## Training Details - **Base Model**: unsloth/Qwen3-1.7B - **Training Method**: Supervised Fine-Tuning (Stage 1) - **Dataset**: RabotniKuma/Fast-Math-R1-SFT - **Epochs**: 10 - **System Prompt**: "Please reason step by step, and put your final answer within \boxed{}" ## Usage ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="Cbgcbg/qwen3-1.7b-math-sft-stage1-20250723_111953", max_seq_length=8192, dtype=None, load_in_4bit=True, ) # Example messages = [ {"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."}, {"role": "user", "content": "What is 2+2?"} ] inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt") outputs = model.generate(input_ids=inputs, max_new_tokens=256) ``` ## Paper Citation Based on the methodology from: ``` A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning ``` This is **Stage 1** - focused on maximizing accuracy. Stage 2 (GRPO) will optimize for token efficiency.