Togyzkumalak RL Agent — Reinforcement Learning for the Kazakh Board Game

About the Game

Togyzkumalak is a traditional Kazakh board game. Two players take turns distributing stones around their side of the board and aim to capture more stones than the opponent.

Key Mechanics:

Capture: If the last stone lands in an opponent's hole with an even number of stones → you capture all.
Tuzdyk: A special hole that can be created if the next hole after a zero contains exactly 1 or 3 stones.
Bonus Move: If your last stone ends in your own empty hole → get another move.

What the Agent Learns

This agent was trained using PPO from Stable Baselines3, with custom rewards for:

Capturing stones
Creating a tuzdyk
Blocking the opponent
Winning the game

Training Summary

Parameter	Value
Total Timesteps	600,000
Number of Updates	2920
FPS (frames per second)	~769
Mean Episode Length	~9.87
Mean Episode Reward	+10.7
Learning Rate	`3e-4`
Entropy Coefficient	`0.1`
Policy Network	MLP

Observation Space

The environment uses the following observation format:

Box(22,) -> [pockets[18], scores[2], player_turn[1], tuzdyk_idx[1]]

Each agent observes:

Stone counts in all 18 holes
Current score of both players
Whose turn it is
Whether a tuzdyk exists and its position

How to Use the Model

Use this model to play Togyzkumalak or train against it!

Load with `huggingface_hub` and `huggingface_sb3`:

!pip install huggingface_hub huggingface_sb3

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from togyzkumalak_env import TogyzkumalakEnv

# Load model
model = PPO.load(load_from_hub("Eraly-ml/TogyzkumalakRL"))

# Create env
env = TogyzkumalakEnv(render_mode='human')
obs, _ = env.reset()

# Play
done = False
while not done:
    action, _ = model.predict(obs)
    obs, reward, done, _, _ = env.step(action)
    env.render()

Project contributors: Gainulla Yeraly