Togyzkumalak RL Agent β€” Reinforcement Learning for the Kazakh Board Game

About the Game

Togyzkumalak is a traditional Kazakh board game. Two players take turns distributing stones around their side of the board and aim to capture more stones than the opponent.

Key Mechanics:

  • Capture: If the last stone lands in an opponent's hole with an even number of stones β†’ you capture all.
  • Tuzdyk: A special hole that can be created if the next hole after a zero contains exactly 1 or 3 stones.
  • Bonus Move: If your last stone ends in your own empty hole β†’ get another move.

What the Agent Learns

This agent was trained using PPO from Stable Baselines3, with custom rewards for:

  • Capturing stones
  • Creating a tuzdyk
  • Blocking the opponent
  • Winning the game

Training Summary

Parameter Value
Total Timesteps 600,000
Number of Updates 2920
FPS (frames per second) ~769
Mean Episode Length ~9.87
Mean Episode Reward +10.7
Learning Rate 3e-4
Entropy Coefficient 0.1
Policy Network MLP

Observation Space

The environment uses the following observation format:

Box(22,) -> [pockets[18], scores[2], player_turn[1], tuzdyk_idx[1]]

Each agent observes:

  • Stone counts in all 18 holes
  • Current score of both players
  • Whose turn it is
  • Whether a tuzdyk exists and its position

How to Use the Model

Use this model to play Togyzkumalak or train against it!

Load with huggingface_hub and huggingface_sb3:

!pip install huggingface_hub huggingface_sb3
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from togyzkumalak_env import TogyzkumalakEnv

# Load model
model = PPO.load(load_from_hub("Eraly-ml/TogyzkumalakRL"))

# Create env
env = TogyzkumalakEnv(render_mode='human')
obs, _ = env.reset()

# Play
done = False
while not done:
    action, _ = model.predict(obs)
    obs, reward, done, _, _ = env.step(action)
    env.render()

Project contributors: Gainulla Yeraly

Downloads last month
17
Video Preview
loading