Chess GRPO Trained Model

This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.

Model Details

  • Model Type: PEFT (merged)
  • Training Method: GRPO (Group Relative Policy Optimization)
  • Task: Chess move generation with evaluation reasoning
  • Source Path: ./grpo_output/skill_6-final
Downloads last month
4
Safetensors
Model size
1.24B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support