Add model card
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ This model has been trained using Group Relative Policy Optimization (GRPO) to p
|
|
19 |
- **Model Type**: PEFT (merged)
|
20 |
- **Training Method**: GRPO (Group Relative Policy Optimization)
|
21 |
- **Task**: Chess move generation with evaluation reasoning
|
22 |
-
- **Source Path**: ./grpo_output/
|
23 |
|
24 |
|
25 |
|
|
|
19 |
- **Model Type**: PEFT (merged)
|
20 |
- **Training Method**: GRPO (Group Relative Policy Optimization)
|
21 |
- **Task**: Chess move generation with evaluation reasoning
|
22 |
+
- **Source Path**: ./grpo_output/skill_6-final
|
23 |
|
24 |
|
25 |
|