Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
base_model:
|
3 |
tags:
|
4 |
- text-generation-inference
|
5 |
- transformers
|
@@ -21,7 +21,7 @@ inference:
|
|
21 |
# Gemma 3 27B GRPO Reasoning Model
|
22 |
|
23 |
## Model Description
|
24 |
-
- **Developed by:**
|
25 |
- **Model type:** Gemma 3 27B fine-tuned with GRPO for reasoning tasks
|
26 |
- **License:** apache-2.0
|
27 |
- **Finetuned from model:** Google's Gemma 3 27B instruction-tuned model
|
@@ -38,9 +38,6 @@ This model was fine-tuned using GRPO (Generative Rejection Policy Optimization),
|
|
38 |
2. Produce correct mathematical solutions
|
39 |
3. Show clear step-by-step reasoning processes
|
40 |
|
41 |
-
### Training Data
|
42 |
-
The model was fine-tuned on the GSM8K dataset containing grade school math problems, teaching the model to break down problems, think step-by-step, and arrive at accurate solutions.
|
43 |
-
|
44 |
### Special Formatting
|
45 |
The model has been trained to follow a specific reasoning format:
|
46 |
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
|
@@ -65,17 +62,6 @@ This model is available in two formats:
|
|
65 |
- 5 sliding + 1 global attention pattern
|
66 |
- 1024 sliding window attention
|
67 |
|
68 |
-
## System Prompt
|
69 |
-
|
70 |
-
To get the best results from this model, use this system prompt:
|
71 |
-
|
72 |
-
```
|
73 |
-
You are given a problem.
|
74 |
-
Think about the problem and provide your working out.
|
75 |
-
Place it between <start_working_out> and <end_working_out>.
|
76 |
-
Then, provide your solution between <SOLUTION></SOLUTION>
|
77 |
-
```
|
78 |
-
|
79 |
## Limitations
|
80 |
- While this model excels at reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
|
81 |
- The model's performance might vary depending on problem complexity and wording.
|
|
|
1 |
---
|
2 |
+
base_model: gemma-3-27b-it
|
3 |
tags:
|
4 |
- text-generation-inference
|
5 |
- transformers
|
|
|
21 |
# Gemma 3 27B GRPO Reasoning Model
|
22 |
|
23 |
## Model Description
|
24 |
+
- **Developed by:** klei1
|
25 |
- **Model type:** Gemma 3 27B fine-tuned with GRPO for reasoning tasks
|
26 |
- **License:** apache-2.0
|
27 |
- **Finetuned from model:** Google's Gemma 3 27B instruction-tuned model
|
|
|
38 |
2. Produce correct mathematical solutions
|
39 |
3. Show clear step-by-step reasoning processes
|
40 |
|
|
|
|
|
|
|
41 |
### Special Formatting
|
42 |
The model has been trained to follow a specific reasoning format:
|
43 |
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
|
|
|
62 |
- 5 sliding + 1 global attention pattern
|
63 |
- 1024 sliding window attention
|
64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
## Limitations
|
66 |
- While this model excels at reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
|
67 |
- The model's performance might vary depending on problem complexity and wording.
|