A newer version of this model is available: FlameF0X/o2

o1 Chess Agent

A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion

Chess Agent Demo Image by Sora, OpenAI

Model Description

o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.

Key Features

🧠 Advanced Architecture

  • Hybrid Diffusion + Transformer Model: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning
  • Deep Residual Neural Network: 20+ residual blocks for strong pattern recognition
  • Monte Carlo Tree Search (MCTS): Looks ahead by simulating many possible futures
  • Denoising Diffusion: Backward (reverse) diffusion process for creative move selection
  • Dual-Color Training: Learns optimal play as both white and black

πŸ”„ Self-Play Learning

  • Continuous Self-Improvement: Learns by playing against itself
  • Balanced Reward System: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
  • Move Limit Penalty: Extra penalty if the game ends by move limit
  • Experience Replay: Trains from a buffer of past games
  • Early Stopping: Prevents overfitting during training

🎯 Experimental & Evaluation Features

  • Diffusion-Based Exploration: Denoising process for creative and robust move selection
  • Transformer Block: Integrates a transformer encoder for global board context
  • ELO Evaluation: Built-in script to estimate model strength via ELO rating
  • Top-N Move Restriction: Optionally restricts move selection to the top-N MCTS candidates
  • No Handcrafted Heuristics: All strategy is learned, not hardcoded
  • Advanced Board/Move Encoding: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding

πŸ–₯️ Modern UI

  • Streamlit Web App: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI

Architecture Details

o1 now features a hybrid neural architecture:

  • Hybrid Policy-Value Network: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection
  • Advanced Board Representation: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule
  • MCTS Integration: Tree search guided by neural network evaluations
  • Self-Play Pipeline: Automated training through continuous game generation
  • Material Reward System: Rewards/penalties for capturing or losing pieces
  • ELO Evaluation: Play games between agents and estimate ELO rating

Usage: Play Against o1

You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app:


Or launch the web UI:

pip install streamlit python-chess torch huggingface_hub
streamlit run app.py

Requirements

  • torch
  • numpy
  • python-chess
  • huggingface_hub
  • streamlit (for web UI)

Performance

  • Trained on millions of self-play games
  • Demonstrates tactical awareness comparable to strong human players
  • Capable of long-term strategic planning through MCTS
  • Robust performance across all game phases
  • Learns to value material and avoid stalling
  • Estimated ELO of 257.3

Applications

Research

  • Study emergent chess strategies through AI self-play
  • Analyze the effectiveness of different neural architectures for game playing
  • Experiment with denoising diffusion in strategic games

Education

  • Chess training partner with adjustable difficulty
  • Analysis tool for game improvement
  • Demonstration of reinforcement learning principles

Competition

  • Tournament-ready chess engine
  • Benchmark for other chess AI systems
  • Platform for testing new chess AI techniques

Technical Specifications

  • Framework: PyTorch with custom MCTS, hybrid diffusion+transformer model
  • Training Method: Self-play reinforcement learning with early stopping
  • Architecture: Residual + Transformer + Diffusion
  • Search Algorithm: Monte Carlo Tree Search
  • Evaluation: Combined policy and value head outputs, ELO script
  • Reward System: Game result, material, and move limit penalties
  • Board/Move Encoding: 20-channel board tensor, robust move encoding/decoding
  • UI: Streamlit drag-and-drop web app

Citation

If you use o1 Chess Agent in your research, please cite:

@misc{o1-chess-agent,
  title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play},
  author={FlameF0X},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/FlameF0X/o1}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Collection including FlameF0X/o1