metadata

license: apache-2.0
language:
  - en
tags:
  - chess
  - reinforcement-learning
  - grpo
  - game-playing
pipeline_tag: text-generation

Chess GRPO Trained Model

This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.

Model Details

Model Type: PEFT (merged)
Training Method: GRPO (Group Relative Policy Optimization)
Task: Chess move generation with evaluation reasoning
Source Path: ./grpo_output/skill_6-final