tyfeng1997 commited on
Commit
2692980
·
verified ·
1 Parent(s): 2f25ff5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -9
README.md CHANGED
@@ -6,31 +6,105 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
 
 
 
 
9
  licence: license
10
  ---
11
 
12
  # Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
13
 
14
- This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Quick start
18
 
19
  ```python
20
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
  ## Training procedure
29
 
 
 
 
 
 
 
 
 
 
30
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
31
 
 
 
 
32
 
33
- This model was trained with SFT.
 
 
 
 
 
34
 
35
  ### Framework versions
36
 
@@ -40,9 +114,11 @@ This model was trained with SFT.
40
  - Datasets: 3.5.1
41
  - Tokenizers: 0.21.1
42
 
43
- ## Citations
44
 
 
45
 
 
46
 
47
  Cite TRL as:
48
 
@@ -55,4 +131,16 @@ Cite TRL as:
55
  publisher = {GitHub},
56
  howpublished = {\url{https://github.com/huggingface/trl}}
57
  }
 
 
 
 
 
 
 
 
 
 
 
 
58
  ```
 
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
+ - math
10
+ - qlora
11
+ - gsm8k
12
+ - reasoning
13
  licence: license
14
  ---
15
 
16
  # Model Card for Qwen3-0.6B-math-orca-qlora-10k-ep1
17
 
18
+ This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) specialized for mathematical reasoning tasks. It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA to maintain high performance while keeping the parameter count low.
19
+
20
+ ## Performance
21
+
22
+ This fine-tuned 0.6B model achieves impressive performance on mathematical reasoning benchmarks:
23
+
24
+ | Model | GSM8K Accuracy | Improvement |
25
+ |-------|----------------|-------------|
26
+ | Base Qwen3-0.6B | 20.17% | - |
27
+ | Fine-tuned Qwen3-0.6B | 43.06% | +113% |
28
+
29
+ Such a significant improvement demonstrates the effectiveness of the fine-tuning approach, achieving results comparable to much larger models.
30
 
31
  ## Quick start
32
 
33
  ```python
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+
36
+ # Load the model and tokenizer
37
+ model = AutoModelForCausalLM.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
38
+ tokenizer = AutoTokenizer.from_pretrained("tyfeng1997/Qwen3-0.6B-math-orca-qlora-10k-ep1", trust_remote_code=True)
39
+
40
+ # Solve a math problem
41
+ question = "If 8x + 5 = 3x - 15, what is the value of x?"
42
+ messages = [
43
+ {"role": "system", "content": "Solve the given math problem step by step, showing all your work."},
44
+ {"role": "user", "content": question}
45
+ ]
46
+
47
+ # Format messages using the chat template
48
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False)
49
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
50
+
51
+ # Generate response
52
+ outputs = model.generate(
53
+ inputs["input_ids"],
54
+ max_new_tokens=512,
55
+ temperature=0.2
56
+ )
57
+
58
+ # Decode and print response
59
+ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
60
+ print(response)
61
+ ```
62
+
63
+ ### Example Output
64
 
65
+ ```
66
+ To solve this equation, I need to isolate the variable x.
67
+
68
+ Given equation: 8x + 5 = 3x - 15
69
+
70
+ Step 1: Subtract 3x from both sides to get all x terms on the left side.
71
+ 8x + 5 - 3x = 3x - 15 - 3x
72
+ 5x + 5 = -15
73
+
74
+ Step 2: Subtract 5 from both sides.
75
+ 5x + 5 - 5 = -15 - 5
76
+ 5x = -20
77
+
78
+ Step 3: Divide both sides by 5 to isolate x.
79
+ 5x/5 = -20/5
80
+ x = -4
81
+
82
+ Therefore, the value of x is -4.
83
  ```
84
 
85
  ## Training procedure
86
 
87
+ This model was fine-tuned using Supervised Fine-Tuning (SFT) on a dataset of mathematics problems and step-by-step solutions. The training used QLoRA to efficiently adapt the model while keeping most parameters frozen.
88
+
89
+ Training configuration:
90
+ - QLoRA with rank 16
91
+ - 1 epochs
92
+ - Learning rate: 2.0e-4
93
+ - Batch size: 8 (effective batch size with gradient accumulation: 16)
94
+ - BF16 precision
95
+
96
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bofeng1997-ty/qwen3-finetune/runs/pd4yxl0p)
97
 
98
+ ## Code and Reproducibility
99
+
100
+ The code for this project is available on GitHub: [https://github.com/tyfeng1997/qwen3-finetune](https://github.com/tyfeng1997/qwen3-finetune)
101
 
102
+ The repository includes scripts for:
103
+ - Data preparation
104
+ - Training with QLoRA
105
+ - Merging weights
106
+ - Evaluation on math benchmarks
107
+ - Deployment with VLLM
108
 
109
  ### Framework versions
110
 
 
114
  - Datasets: 3.5.1
115
  - Tokenizers: 0.21.1
116
 
117
+ ## Usage and Limitations
118
 
119
+ This model is specifically optimized for mathematical reasoning tasks and may not perform as well on general-purpose tasks. It excels at step-by-step problem solving for high school level mathematics.
120
 
121
+ ## Citations
122
 
123
  Cite TRL as:
124
 
 
131
  publisher = {GitHub},
132
  howpublished = {\url{https://github.com/huggingface/trl}}
133
  }
134
+ ```
135
+
136
+ If you use this model in your research, please cite:
137
+
138
+ ```bibtex
139
+ @misc{qwen3-0.6B-math,
140
+ author = {Feng, Bo},
141
+ title = {Qwen3-0.6B-math: Fine-tuned small language model for mathematical reasoning},
142
+ year = {2025},
143
+ publisher = {GitHub},
144
+ howpublished = {\url{https://github.com/tyfeng1997/qwen3-finetune}}
145
+ }
146
  ```