o1 / README.md
FlameF0X's picture
Update README.md
015e725 verified
---
license: apache-2.0
model_name: o1
repo: FlameF0X/o1
pipeline_tag: reinforcement-learning
tags:
- chess
- reinforcement-learning
- mcts
- diffusion hybrid
- alpha-zero-style
- deep-learning
- monte-carlo-tree-search
- self-play
- alphazero
- research
- games
new_version: FlameF0X/o2
---
# o1 Chess Agent
**A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion**
![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
*Image by Sora, OpenAI*
## Model Description
o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.
## Key Features
### 🧠 **Advanced Architecture**
- **Hybrid Diffusion + Transformer Model**: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning
- **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition
- **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures
- **Denoising Diffusion**: Backward (reverse) diffusion process for creative move selection
- **Dual-Color Training**: Learns optimal play as both white and black
### πŸ”„ **Self-Play Learning**
- **Continuous Self-Improvement**: Learns by playing against itself
- **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
- **Move Limit Penalty**: Extra penalty if the game ends by move limit
- **Experience Replay**: Trains from a buffer of past games
- **Early Stopping**: Prevents overfitting during training
### 🎯 **Experimental & Evaluation Features**
- **Diffusion-Based Exploration**: Denoising process for creative and robust move selection
- **Transformer Block**: Integrates a transformer encoder for global board context
- **ELO Evaluation**: Built-in script to estimate model strength via ELO rating
- **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates
- **No Handcrafted Heuristics**: All strategy is learned, not hardcoded
- **Advanced Board/Move Encoding**: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding
### πŸ–₯️ **Modern UI**
- **Streamlit Web App**: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI
## Architecture Details
o1 now features a hybrid neural architecture:
- **Hybrid Policy-Value Network**: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection
- **Advanced Board Representation**: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule
- **MCTS Integration**: Tree search guided by neural network evaluations
- **Self-Play Pipeline**: Automated training through continuous game generation
- **Material Reward System**: Rewards/penalties for capturing or losing pieces
- **ELO Evaluation**: Play games between agents and estimate ELO rating
## Usage: Play Against o1
You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app:
```python
```
Or launch the web UI:
```powershell
pip install streamlit python-chess torch huggingface_hub
streamlit run app.py
```
## Requirements
- torch
- numpy
- python-chess
- huggingface_hub
- streamlit (for web UI)
## Performance
- Trained on millions of self-play games
- Demonstrates tactical awareness comparable to strong human players
- Capable of long-term strategic planning through MCTS
- Robust performance across all game phases
- Learns to value material and avoid stalling
- Estimated ELO of 257.3
## Applications
### Research
- Study emergent chess strategies through AI self-play
- Analyze the effectiveness of different neural architectures for game playing
- Experiment with denoising diffusion in strategic games
### Education
- Chess training partner with adjustable difficulty
- Analysis tool for game improvement
- Demonstration of reinforcement learning principles
### Competition
- Tournament-ready chess engine
- Benchmark for other chess AI systems
- Platform for testing new chess AI techniques
## Technical Specifications
- **Framework**: PyTorch with custom MCTS, hybrid diffusion+transformer model
- **Training Method**: Self-play reinforcement learning with early stopping
- **Architecture**: Residual + Transformer + Diffusion
- **Search Algorithm**: Monte Carlo Tree Search
- **Evaluation**: Combined policy and value head outputs, ELO script
- **Reward System**: Game result, material, and move limit penalties
- **Board/Move Encoding**: 20-channel board tensor, robust move encoding/decoding
- **UI**: Streamlit drag-and-drop web app
## Citation
If you use o1 Chess Agent in your research, please cite:
```bibtex
@misc{o1-chess-agent,
title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play},
author={FlameF0X},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/FlameF0X/o1}}
}
```