Togyzkumalak RL Agent β Reinforcement Learning for the Kazakh Board Game
About the Game
Togyzkumalak is a traditional Kazakh board game. Two players take turns distributing stones around their side of the board and aim to capture more stones than the opponent.
Key Mechanics:
- Capture: If the last stone lands in an opponent's hole with an even number of stones β you capture all.
- Tuzdyk: A special hole that can be created if the next hole after a zero contains exactly 1 or 3 stones.
- Bonus Move: If your last stone ends in your own empty hole β get another move.
What the Agent Learns
This agent was trained using PPO from Stable Baselines3, with custom rewards for:
- Capturing stones
- Creating a tuzdyk
- Blocking the opponent
- Winning the game
Training Summary
Parameter | Value |
---|---|
Total Timesteps | 600,000 |
Number of Updates | 2920 |
FPS (frames per second) | ~769 |
Mean Episode Length | ~9.87 |
Mean Episode Reward | +10.7 |
Learning Rate | 3e-4 |
Entropy Coefficient | 0.1 |
Policy Network | MLP |
Observation Space
The environment uses the following observation format:
Box(22,) -> [pockets[18], scores[2], player_turn[1], tuzdyk_idx[1]]
Each agent observes:
- Stone counts in all 18 holes
- Current score of both players
- Whose turn it is
- Whether a tuzdyk exists and its position
How to Use the Model
Use this model to play Togyzkumalak or train against it!
Load with huggingface_hub
and huggingface_sb3
:
!pip install huggingface_hub huggingface_sb3
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from togyzkumalak_env import TogyzkumalakEnv
# Load model
model = PPO.load(load_from_hub("Eraly-ml/TogyzkumalakRL"))
# Create env
env = TogyzkumalakEnv(render_mode='human')
obs, _ = env.reset()
# Play
done = False
while not done:
action, _ = model.predict(obs)
obs, reward, done, _, _ = env.step(action)
env.render()
Project contributors: Gainulla Yeraly
- Downloads last month
- 17