Chess GRPO Trained Model
This model has been trained using Group Relative Policy Optimization (GRPO) to play chess. It was trained to generate chess moves in JSON format with reasoning.
Model Details
- Model Type: PEFT (merged)
- Training Method: GRPO (Group Relative Policy Optimization)
- Task: Chess move generation with evaluation reasoning
- Source Path: ./grpo_output/skill_6-final
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support