metadata

license: apache-2.0
base_model: unsloth/Qwen3-1.7B
tags:
  - unsloth
  - qwen3
  - mathematical-reasoning
  - sft
  - anti-overfitting
language:
  - en
pipeline_tag: text-generation
library_name: transformers

Qwen3-1.7B Math SFT - Anti-Overfitting Version

Trained with anti-overfitting measures based on "A Practical Two-Stage Recipe for Mathematical LLMs" paper.

Training Details

Base Model: unsloth/Qwen3-1.7B
Parameters: 1,720,032,256 (all fine-tuned)
Epochs: 10
Batch Size: 8
Learning Rate: 5e-06 (reduced for stability)
Weight Decay: 0.1 (increased regularization)
Approach: Full model training with anti-overfitting measures

Anti-Overfitting Measures

Reduced learning rate: 5e-06
Increased weight decay: 0.1
Extended warmup: 10% of steps
Early stopping on validation loss
Regular evaluation checkpoints

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Cbgcbg/qwen3-1.7b-math-sft-antioverfitting-20250724_165951",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Cbgcbg/qwen3-1.7b-math-sft-antioverfitting-20250724_165951")

messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \boxed{}."},
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)

Training timestamp: 20250724_165951