metadata

license: apache-2.0
tags:
  - unsloth
  - trl
  - sft
  - math
  - reasoning
datasets:
  - unsloth/OpenMathReasoning-mini
language:
  - en
base_model:
  - Qwen/Qwen3-0.6B
pipeline_tag: text-generation
library_name: transformers

Qwen3-0.6B-Math-Expert

This project performs full fine-tuning on the Qwen3-0.6B language model to enhance its mathematical problem-solving and reasoning capabilities. Training was conducted exclusively on the OpenMathReasoning-mini dataset, and the model was optimized using the bfloat16 (bf16) data type.

Training Procedure

Dataset Preparation
- The unsloth/OpenMathReasoning-mini dataset was used.
- Each example was formatted in Chain-of-Thought (CoT) style, pairing math problems with step-by-step intermediate reasoning.
Model Loading and Configuration
- Qwen3 base model weights were loaded via the unsloth library in bf16 precision.
- All layers were updated (full_finetuning=True) to adapt the model for mathematical reasoning.
Supervised Fine-Tuning
- Leveraged the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach.
- The model was trained to generate both correct answers and corresponding reasoning chains.

Purpose and Outcome

The model’s reasoning capacity for math problems was significantly improved through single-dataset, full fine-tuning in bf16 precision.
Outputs include both intermediate reasoning steps and final solutions, providing transparent and interpretable results.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

suayptalha
/

Qwen3-0.6B-Math-Expert

Qwen3-0.6B-Math-Expert

Training Procedure

Purpose and Outcome

License

Support