File size: 5,419 Bytes
5b64568 f33810e a6c019c f33810e 4882487 f33810e 8f486a0 f33810e 015e725 7c9a4b2 8f486a0 e2455b0 7c9a4b2 5cdcdf6 8f486a0 2103236 8f486a0 f33810e 8f486a0 5cdcdf6 8f486a0 db80e8a 5cdcdf6 db80e8a 5cdcdf6 8f486a0 5cdcdf6 db80e8a 8f486a0 db80e8a 5cdcdf6 db80e8a 5cdcdf6 db80e8a 8f486a0 db80e8a 8f486a0 5cdcdf6 db80e8a 8f486a0 f33810e db80e8a f33810e 30da8a8 f33810e db80e8a 5cdcdf6 f33810e 5cdcdf6 f33810e 8f486a0 5cdcdf6 4170402 8f486a0 5cdcdf6 8f486a0 c8c686b 8f486a0 7c9a4b2 db80e8a 8f486a0 db80e8a 5cdcdf6 db80e8a 7c9a4b2 8f486a0 7c9a4b2 8f486a0 7c9a4b2 8f486a0 f33810e 8f486a0 f33810e 8f486a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
model_name: o1
repo: FlameF0X/o1
pipeline_tag: reinforcement-learning
tags:
- chess
- reinforcement-learning
- mcts
- diffusion hybrid
- alpha-zero-style
- deep-learning
- monte-carlo-tree-search
- self-play
- alphazero
- research
- games
new_version: FlameF0X/o2
---
# o1 Chess Agent
**A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion**

*Image by Sora, OpenAI*
## Model Description
o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.
## Key Features
### 🧠 **Advanced Architecture**
- **Hybrid Diffusion + Transformer Model**: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning
- **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition
- **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures
- **Denoising Diffusion**: Backward (reverse) diffusion process for creative move selection
- **Dual-Color Training**: Learns optimal play as both white and black
### 🔄 **Self-Play Learning**
- **Continuous Self-Improvement**: Learns by playing against itself
- **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
- **Move Limit Penalty**: Extra penalty if the game ends by move limit
- **Experience Replay**: Trains from a buffer of past games
- **Early Stopping**: Prevents overfitting during training
### 🎯 **Experimental & Evaluation Features**
- **Diffusion-Based Exploration**: Denoising process for creative and robust move selection
- **Transformer Block**: Integrates a transformer encoder for global board context
- **ELO Evaluation**: Built-in script to estimate model strength via ELO rating
- **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates
- **No Handcrafted Heuristics**: All strategy is learned, not hardcoded
- **Advanced Board/Move Encoding**: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding
### 🖥️ **Modern UI**
- **Streamlit Web App**: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI
## Architecture Details
o1 now features a hybrid neural architecture:
- **Hybrid Policy-Value Network**: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection
- **Advanced Board Representation**: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule
- **MCTS Integration**: Tree search guided by neural network evaluations
- **Self-Play Pipeline**: Automated training through continuous game generation
- **Material Reward System**: Rewards/penalties for capturing or losing pieces
- **ELO Evaluation**: Play games between agents and estimate ELO rating
## Usage: Play Against o1
You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app:
```python
```
Or launch the web UI:
```powershell
pip install streamlit python-chess torch huggingface_hub
streamlit run app.py
```
## Requirements
- torch
- numpy
- python-chess
- huggingface_hub
- streamlit (for web UI)
## Performance
- Trained on millions of self-play games
- Demonstrates tactical awareness comparable to strong human players
- Capable of long-term strategic planning through MCTS
- Robust performance across all game phases
- Learns to value material and avoid stalling
- Estimated ELO of 257.3
## Applications
### Research
- Study emergent chess strategies through AI self-play
- Analyze the effectiveness of different neural architectures for game playing
- Experiment with denoising diffusion in strategic games
### Education
- Chess training partner with adjustable difficulty
- Analysis tool for game improvement
- Demonstration of reinforcement learning principles
### Competition
- Tournament-ready chess engine
- Benchmark for other chess AI systems
- Platform for testing new chess AI techniques
## Technical Specifications
- **Framework**: PyTorch with custom MCTS, hybrid diffusion+transformer model
- **Training Method**: Self-play reinforcement learning with early stopping
- **Architecture**: Residual + Transformer + Diffusion
- **Search Algorithm**: Monte Carlo Tree Search
- **Evaluation**: Combined policy and value head outputs, ELO script
- **Reward System**: Game result, material, and move limit penalties
- **Board/Move Encoding**: 20-channel board tensor, robust move encoding/decoding
- **UI**: Streamlit drag-and-drop web app
## Citation
If you use o1 Chess Agent in your research, please cite:
```bibtex
@misc{o1-chess-agent,
title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play},
author={FlameF0X},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/FlameF0X/o1}}
}
``` |