tyfeng1997
/

Qwen3-0.6B-math-orca-qlora-10k-ep1

@@ -6,31 +6,105 @@ tags:
 - generated_from_trainer
 - trl
 - sft
 licence: license
 ---
 # Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
-This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
-It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Training procedure
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
-This model was trained with SFT.
 ### Framework versions
@@ -40,9 +114,11 @@ This model was trained with SFT.
 - Datasets: 3.5.1
 - Tokenizers: 0.21.1
-## Citations
 Cite TRL as:
@@ -55,4 +131,16 @@ Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```

 - generated_from_trainer
 - trl
 - sft
+- math
+- qlora
+- gsm8k
+- reasoning
 licence: license
 ---
 # Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
+This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.
+## Performance
+This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:
+| Model | GSM8K Accuracy | Improvement |
+|-------|----------------|-------------|
+| Base Qwen3-0.6B | 20.17% | - |
+| Fine-tuned Qwen3-0.6B | 43.06% | +113% |
+Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.
 ## Quick start
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
+# Solve a math problem
+question = "If 8x + 5 = 3x - 15, what is the value of x?"
+messages = [
+    {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
+    {"role": "user", "content": question}
+]
+# Format messages using the chat template
+input_text = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(
+    inputs["input_ids"],
+    max_new_tokens=512,
+    temperature=0.2
+)
+# Decode and print response
+response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### Example Output
+```
+To solve this equation, I need to isolate the variable x.
+Given equation: 8x + 5 = 3x - 15
+Step 1: Subtract 3x from both sides to get all x terms on the left side.
+8x + 5 - 3x = 3x - 15 - 3x
+5x + 5 = -15
+Step 2: Subtract 5 from both sides.
+5x + 5 - 5 = -15 - 5
+5x = -20
+Step 3: Divide both sides by 5 to isolate x.
+5x/5 = -20/5
+x = -4
+Therefore, the value of x is -4.
 ```
 ## Training procedure
+This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.
+Training configuration:
+- QLoRA with rank 16
+- 1 epochs
+- Learning rate: 2.0e-4
+- Batch size: 8 (effective batch size with gradient accumulation: 16)
+- BF16 precision
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
+## Code and Reproducibility
+The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)
+The repository includes scripts for:
+- Data preparation
+- Training with QLoRA
+- Merging weights
+- Evaluation on math benchmarks
+- Deployment with VLLM
 ### Framework versions
 - Datasets: 3.5.1
 - Tokenizers: 0.21.1
+## Usage and Limitations
+This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.
+## Citations
 Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```
+If you use this model in your research, please cite:
+```bibtex
+@misc{qwen3-0.6B-math,
+    author = {Feng, Bo},
+    title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
+    year = {2025},
+    publisher = {GitHub},
+    howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
+}
 ```