File size: 5,419 Bytes
5b64568
 
f33810e
 
a6c019c
 
 
f33810e
 
4882487
f33810e
8f486a0
 
 
 
f33810e
 
015e725
7c9a4b2
8f486a0
e2455b0
7c9a4b2
5cdcdf6
8f486a0
 
2103236
8f486a0
f33810e
8f486a0
5cdcdf6
8f486a0
 
 
 
db80e8a
5cdcdf6
 
db80e8a
5cdcdf6
8f486a0
 
5cdcdf6
 
 
 
db80e8a
8f486a0
db80e8a
5cdcdf6
db80e8a
 
5cdcdf6
 
db80e8a
 
 
 
8f486a0
 
 
db80e8a
 
 
8f486a0
 
5cdcdf6
db80e8a
8f486a0
f33810e
 
db80e8a
f33810e
 
30da8a8
f33810e
 
db80e8a
5cdcdf6
 
 
 
 
 
f33810e
 
 
 
 
 
5cdcdf6
f33810e
8f486a0
 
 
 
 
5cdcdf6
 
4170402
8f486a0
 
 
 
 
5cdcdf6
8f486a0
 
 
 
 
 
 
 
 
 
c8c686b
8f486a0
7c9a4b2
db80e8a
 
 
8f486a0
db80e8a
5cdcdf6
db80e8a
 
7c9a4b2
8f486a0
7c9a4b2
8f486a0
7c9a4b2
8f486a0
 
 
f33810e
8f486a0
 
f33810e
8f486a0
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
model_name: o1
repo: FlameF0X/o1
pipeline_tag: reinforcement-learning
tags:
- chess
- reinforcement-learning
- mcts
- diffusion hybrid
- alpha-zero-style
- deep-learning
- monte-carlo-tree-search
- self-play
- alphazero
- research
- games
new_version: FlameF0X/o2
---

# o1 Chess Agent

**A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion**

![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
*Image by Sora, OpenAI*

## Model Description

o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.

## Key Features

### 🧠 **Advanced Architecture**
- **Hybrid Diffusion + Transformer Model**: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning
- **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition
- **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures
- **Denoising Diffusion**: Backward (reverse) diffusion process for creative move selection
- **Dual-Color Training**: Learns optimal play as both white and black

### 🔄 **Self-Play Learning**
- **Continuous Self-Improvement**: Learns by playing against itself
- **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
- **Move Limit Penalty**: Extra penalty if the game ends by move limit
- **Experience Replay**: Trains from a buffer of past games
- **Early Stopping**: Prevents overfitting during training

### 🎯 **Experimental & Evaluation Features**
- **Diffusion-Based Exploration**: Denoising process for creative and robust move selection
- **Transformer Block**: Integrates a transformer encoder for global board context
- **ELO Evaluation**: Built-in script to estimate model strength via ELO rating
- **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates
- **No Handcrafted Heuristics**: All strategy is learned, not hardcoded
- **Advanced Board/Move Encoding**: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding

### 🖥️ **Modern UI**
- **Streamlit Web App**: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI

## Architecture Details

o1 now features a hybrid neural architecture:
- **Hybrid Policy-Value Network**: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection
- **Advanced Board Representation**: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule
- **MCTS Integration**: Tree search guided by neural network evaluations
- **Self-Play Pipeline**: Automated training through continuous game generation
- **Material Reward System**: Rewards/penalties for capturing or losing pieces
- **ELO Evaluation**: Play games between agents and estimate ELO rating

## Usage: Play Against o1

You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app:

```python

```

Or launch the web UI:

```powershell
pip install streamlit python-chess torch huggingface_hub
streamlit run app.py
```

## Requirements

- torch
- numpy
- python-chess
- huggingface_hub
- streamlit (for web UI)

## Performance

- Trained on millions of self-play games
- Demonstrates tactical awareness comparable to strong human players
- Capable of long-term strategic planning through MCTS
- Robust performance across all game phases
- Learns to value material and avoid stalling
- Estimated ELO of 257.3
## Applications

### Research
- Study emergent chess strategies through AI self-play
- Analyze the effectiveness of different neural architectures for game playing
- Experiment with denoising diffusion in strategic games

### Education
- Chess training partner with adjustable difficulty
- Analysis tool for game improvement
- Demonstration of reinforcement learning principles

### Competition
- Tournament-ready chess engine
- Benchmark for other chess AI systems
- Platform for testing new chess AI techniques

## Technical Specifications

- **Framework**: PyTorch with custom MCTS, hybrid diffusion+transformer model
- **Training Method**: Self-play reinforcement learning with early stopping
- **Architecture**: Residual + Transformer + Diffusion
- **Search Algorithm**: Monte Carlo Tree Search
- **Evaluation**: Combined policy and value head outputs, ELO script
- **Reward System**: Game result, material, and move limit penalties
- **Board/Move Encoding**: 20-channel board tensor, robust move encoding/decoding
- **UI**: Streamlit drag-and-drop web app

## Citation

If you use o1 Chess Agent in your research, please cite:

```bibtex
@misc{o1-chess-agent,
  title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play},
  author={FlameF0X},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/FlameF0X/o1}}
}
```