Chess GRPO Trained Model

Model Type: PEFT (merged)
Training Method: GRPO (Group Relative Policy Optimization)
Task: Chess move generation with evaluation reasoning
Source Path: ./grpo_output/skill_6-final

This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.