|
--- |
|
license: apache-2.0 |
|
model_name: o1 |
|
repo: FlameF0X/o1 |
|
pipeline_tag: reinforcement-learning |
|
tags: |
|
- chess |
|
- reinforcement-learning |
|
- mcts |
|
- diffusion hybrid |
|
- alpha-zero-style |
|
- deep-learning |
|
- monte-carlo-tree-search |
|
- self-play |
|
- alphazero |
|
- research |
|
- games |
|
new_version: FlameF0X/o2 |
|
--- |
|
|
|
# o1 Chess Agent |
|
|
|
**A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion** |
|
|
|
 |
|
*Image by Sora, OpenAI* |
|
|
|
## Model Description |
|
|
|
o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling. |
|
|
|
## Key Features |
|
|
|
### π§ **Advanced Architecture** |
|
- **Hybrid Diffusion + Transformer Model**: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning |
|
- **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition |
|
- **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures |
|
- **Denoising Diffusion**: Backward (reverse) diffusion process for creative move selection |
|
- **Dual-Color Training**: Learns optimal play as both white and black |
|
|
|
### π **Self-Play Learning** |
|
- **Continuous Self-Improvement**: Learns by playing against itself |
|
- **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one) |
|
- **Move Limit Penalty**: Extra penalty if the game ends by move limit |
|
- **Experience Replay**: Trains from a buffer of past games |
|
- **Early Stopping**: Prevents overfitting during training |
|
|
|
### π― **Experimental & Evaluation Features** |
|
- **Diffusion-Based Exploration**: Denoising process for creative and robust move selection |
|
- **Transformer Block**: Integrates a transformer encoder for global board context |
|
- **ELO Evaluation**: Built-in script to estimate model strength via ELO rating |
|
- **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates |
|
- **No Handcrafted Heuristics**: All strategy is learned, not hardcoded |
|
- **Advanced Board/Move Encoding**: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding |
|
|
|
### π₯οΈ **Modern UI** |
|
- **Streamlit Web App**: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI |
|
|
|
## Architecture Details |
|
|
|
o1 now features a hybrid neural architecture: |
|
- **Hybrid Policy-Value Network**: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection |
|
- **Advanced Board Representation**: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule |
|
- **MCTS Integration**: Tree search guided by neural network evaluations |
|
- **Self-Play Pipeline**: Automated training through continuous game generation |
|
- **Material Reward System**: Rewards/penalties for capturing or losing pieces |
|
- **ELO Evaluation**: Play games between agents and estimate ELO rating |
|
|
|
## Usage: Play Against o1 |
|
|
|
You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app: |
|
|
|
```python |
|
|
|
``` |
|
|
|
Or launch the web UI: |
|
|
|
```powershell |
|
pip install streamlit python-chess torch huggingface_hub |
|
streamlit run app.py |
|
``` |
|
|
|
## Requirements |
|
|
|
- torch |
|
- numpy |
|
- python-chess |
|
- huggingface_hub |
|
- streamlit (for web UI) |
|
|
|
## Performance |
|
|
|
- Trained on millions of self-play games |
|
- Demonstrates tactical awareness comparable to strong human players |
|
- Capable of long-term strategic planning through MCTS |
|
- Robust performance across all game phases |
|
- Learns to value material and avoid stalling |
|
- Estimated ELO of 257.3 |
|
## Applications |
|
|
|
### Research |
|
- Study emergent chess strategies through AI self-play |
|
- Analyze the effectiveness of different neural architectures for game playing |
|
- Experiment with denoising diffusion in strategic games |
|
|
|
### Education |
|
- Chess training partner with adjustable difficulty |
|
- Analysis tool for game improvement |
|
- Demonstration of reinforcement learning principles |
|
|
|
### Competition |
|
- Tournament-ready chess engine |
|
- Benchmark for other chess AI systems |
|
- Platform for testing new chess AI techniques |
|
|
|
## Technical Specifications |
|
|
|
- **Framework**: PyTorch with custom MCTS, hybrid diffusion+transformer model |
|
- **Training Method**: Self-play reinforcement learning with early stopping |
|
- **Architecture**: Residual + Transformer + Diffusion |
|
- **Search Algorithm**: Monte Carlo Tree Search |
|
- **Evaluation**: Combined policy and value head outputs, ELO script |
|
- **Reward System**: Game result, material, and move limit penalties |
|
- **Board/Move Encoding**: 20-channel board tensor, robust move encoding/decoding |
|
- **UI**: Streamlit drag-and-drop web app |
|
|
|
## Citation |
|
|
|
If you use o1 Chess Agent in your research, please cite: |
|
|
|
```bibtex |
|
@misc{o1-chess-agent, |
|
title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play}, |
|
author={FlameF0X}, |
|
year={2024}, |
|
publisher={Hugging Face}, |
|
howpublished={\url{https://huggingface.co/FlameF0X/o1}} |
|
} |
|
``` |