--- license: apache-2.0 base_model: unsloth/Qwen3-8B tags: - unsloth - qwen3 - mathematical-reasoning - epoch11-checkpoint - full-finetuning - paper-config language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen3-8B Math SFT - Epoch 11 Checkpoint **Full parameter fine-tuning checkpoint** from mathematical reasoning training. ## 📊 Training Details - **Base Model**: unsloth/Qwen3-8B (full precision) - **Training Method**: Full parameter fine-tuning (92.4% parameters trained) - **Progress**: Epoch 11/20 (55% complete) - **Dataset**: Paper's Official Dataset (7,110 training samples) - **Configuration**: Paper's exact Stage 1 SFT settings ## 🔧 Training Configuration - **Batch Size**: 1 x 16 = 16 effective - **Learning Rate**: 1e-5 (paper's exact) - **Max Sequence Length**: 24,000 (paper's exact) - **Optimizer**: paged_adamw_8bit - **Scheduler**: cosine - **Epochs**: 20 total ## 🎯 Expected Performance ### Epoch 11 Characteristics: **Mid-Stage**: Strong mathematical reasoning capability. Good accuracy on most problems, well-structured solutions. ## 📈 Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model model = AutoModelForCausalLM.from_pretrained( "Cbgcbg/qwen3-8b-math-full-sft-epoch11-20250725_161659", torch_dtype="auto", device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained("Cbgcbg/qwen3-8b-math-full-sft-epoch11-20250725_161659") # Example usage messages = [ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."}, {"role": "user", "content": "Find the derivative of f(x) = x^3 + 2x^2 - 5x + 3"} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", add_generation_prompt=True ) outputs = model.generate( inputs, max_new_tokens=512, temperature=0.7, do_sample=True ) response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True) print(response) ``` ## 🔗 Related Models - **Paper Source**: "A Practical Two-Stage Recipe for Mathematical LLMs" - **Training Approach**: Full parameter fine-tuning (Stage 1 SFT only) - **Final Model**: Will be available after 20 epochs complete ## 📅 Training Timeline - **Started**: 20250725_161659 - **Current**: Epoch 11/20 checkpoint - **Status**: Intermediate checkpoint --- *This model follows the exact configuration from the paper's Stage 1 SFT approach with full parameter fine-tuning for optimal mathematical reasoning performance.*