--- base_model: Qwen/Qwen3-0.6B library_name: transformers model_name: Qwen3-0.6B-math-orca-qlora-10k-ep1 tags: - generated_from_trainer - trl - sft - math - qlora - gsm8k - reasoning licence: license --- # Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1 This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low. ## Performance This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks: | Model | GSM8K Accuracy | Improvement | |-------|----------------|-------------| | Base Qwen3-0.6B | 20.17% | - | | Fine-tuned Qwen3-0.6B | 43.06% | +113% | Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models. ## Quick start ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True) # Solve a math problem question = "If 8x + 5 = 3x - 15, what is the value of x?" messages = [ {"role": "system", "content": "Solve the given math problem step by step, showing all your work."}, {"role": "user", "content": question} ] # Format messages using the chat template input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) # Generate response outputs = model.generate( inputs["input_ids"], max_new_tokens=512, temperature=0.2 ) # Decode and print response response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) ``` ### Example Output ``` To solve this equation, I need to isolate the variable x. Given equation: 8x + 5 = 3x - 15 Step 1: Subtract 3x from both sides to get all x terms on the left side. 8x + 5 - 3x = 3x - 15 - 3x 5x + 5 = -15 Step 2: Subtract 5 from both sides. 5x + 5 - 5 = -15 - 5 5x = -20 Step 3: Divide both sides by 5 to isolate x. 5x/5 = -20/5 x = -4 Therefore, the value of x is -4. ``` ## Training procedure This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen. Training configuration: - QLoRA with rank 16 - 1 epochs - Learning rate: 2.0e-4 - Batch size: 8 (effective batch size with gradient accumulation: 16) - BF16 precision [Visualize in Weights & Biases](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p) ## Code and Reproducibility The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune) The repository includes scripts for: - Data preparation - Training with QLoRA - Merging weights - Evaluation on math benchmarks - Deployment with VLLM ### Framework versions - TRL: 0.18.0.dev0 - Transformers: 4.52.0.dev0 - Pytorch: 2.6.0 - Datasets: 3.5.1 - Tokenizers: 0.21.1 ## Usage and Limitations This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics. ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ``` If you use this model in your research, please cite: ```bibtex @misc{qwen3-0.6B-math, author = {Feng, Bo}, title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning}, year = {2025}, publisher = {GitHub}, howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}} } ```