parthh01's picture
Add model card
5943a9f verified
metadata
license: apache-2.0
language:
  - en
tags:
  - chess
  - reinforcement-learning
  - grpo
  - game-playing
pipeline_tag: text-generation

Chess GRPO Trained Model

This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.

Model Details

  • Model Type: PEFT (merged)
  • Training Method: GRPO (Group Relative Policy Optimization)
  • Task: Chess move generation with evaluation reasoning
  • Source Path: ./grpo_output/skill_6-final