metadata
license: apache-2.0
language:
- en
tags:
- chess
- reinforcement-learning
- grpo
- game-playing
pipeline_tag: text-generation
Chess GRPO Trained Model
This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.
Model Details
- Model Type: PEFT (merged)
- Training Method: GRPO (Group Relative Policy Optimization)
- Task: Chess move generation with evaluation reasoning
- Source Path: ./grpo_output/skill_6-final