Add comprehensive model card with usage instructions and evaluation results
Browse files
README.md
CHANGED
@@ -39,8 +39,8 @@ This LoRA adapter enhances google/gemma-3-1b-it with structured reasoning capabi
|
|
39 |
- **Training Method**: GRPO (Group Relative Policy Optimization)
|
40 |
- **LoRA Rank**: 64
|
41 |
- **LoRA Alpha**: 128
|
42 |
-
- **Training Samples**:
|
43 |
-
- **Thinking Tag Usage**:
|
44 |
- **Average Quality Score**: 0.00
|
45 |
|
46 |
## 🔧 Usage
|
@@ -68,7 +68,7 @@ Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30
|
|
68 |
Response:'''
|
69 |
|
70 |
inputs = tokenizer(prompt, return_tensors="pt")
|
71 |
-
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.
|
72 |
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
73 |
print(response)
|
74 |
```
|
@@ -131,7 +131,7 @@ The model was trained on self-generated reasoning problems across multiple domai
|
|
131 |
## 🔬 Evaluation
|
132 |
|
133 |
The adapter was evaluated on diverse reasoning tasks:
|
134 |
-
- Thinking tag usage rate:
|
135 |
- Average reasoning quality score: 0.00
|
136 |
- Response comprehensiveness: 0 words average
|
137 |
|
|
|
39 |
- **Training Method**: GRPO (Group Relative Policy Optimization)
|
40 |
- **LoRA Rank**: 64
|
41 |
- **LoRA Alpha**: 128
|
42 |
+
- **Training Samples**: 107
|
43 |
+
- **Thinking Tag Usage**: 40.0%
|
44 |
- **Average Quality Score**: 0.00
|
45 |
|
46 |
## 🔧 Usage
|
|
|
68 |
Response:'''
|
69 |
|
70 |
inputs = tokenizer(prompt, return_tensors="pt")
|
71 |
+
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
|
72 |
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
73 |
print(response)
|
74 |
```
|
|
|
131 |
## 🔬 Evaluation
|
132 |
|
133 |
The adapter was evaluated on diverse reasoning tasks:
|
134 |
+
- Thinking tag usage rate: 40.0%
|
135 |
- Average reasoning quality score: 0.00
|
136 |
- Response comprehensiveness: 0 words average
|
137 |
|