FlameF0X commited on
Commit
5cdcdf6
·
verified ·
1 Parent(s): 4f4847e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -19
README.md CHANGED
@@ -20,39 +20,42 @@ tags:
20
 
21
  # o1 Chess Agent
22
 
23
- **A state-of-the-art chess AI powered by deep reinforcement learning and self-play**
24
 
25
  ![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
26
  *Image by Sora, OpenAI*
27
 
28
  ## Model Description
29
 
30
- **o1** is a state-of-the-art chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and optional diffusion-based exploration. Through continuous self-play and deep reinforcement learning, o1 develops sophisticated chess strategies and tactical awareness. The model learns by self-play, switching between black and white, and is penalized for losses as either color. It is designed for research, experimentation, and competitive play.
31
 
32
  ## Key Features
33
 
34
  ### 🧠 **Advanced Architecture**
35
- - **Deep Residual Neural Network**: 10+ residual blocks inspired by AlphaZero for superior pattern recognition
36
- - **Monte Carlo Tree Search (MCTS)**: Strategic lookahead planning that evaluates millions of potential game states
37
- - **Dual-Color Training**: Learns optimal play from both white and black perspectives
 
38
 
39
  ### 🔄 **Self-Play Learning**
40
- - **Continuous Self-Improvement**: Battles against itself to discover new strategies and counter-strategies
41
- - **Balanced Reward System**: Penalized for losses as either color, ensuring robust performance across all game phases
42
- - **Experience Replay**: Learns from historical games to reinforce successful patterns
 
43
 
44
  ### 🎯 **Experimental Features**
45
- - **Diffusion-Based Exploration**: Optional diffusion process introduces creative variations in move selection
46
- - **Adaptive Strategy**: Adjusts playing style based on opponent patterns and game state
 
47
 
48
  ## Architecture Details
49
 
50
  o1 combines several cutting-edge techniques:
51
-
52
  - **Policy-Value Network**: Simultaneous move prediction and position evaluation
53
  - **Residual Connections**: Deep architecture with skip connections for stable training
54
  - **MCTS Integration**: Tree search guided by neural network evaluations
55
  - **Self-Play Pipeline**: Automated training through continuous game generation
 
56
 
57
  ## Usage: Play Against o1
58
 
@@ -80,31 +83,37 @@ print("Policy logits shape:", policy_logits.shape)
80
  print("Value:", value.item())
81
  ```
82
 
 
 
 
 
 
 
 
 
 
83
  ## Requirements
84
 
85
  - torch
86
  - numpy
87
  - python-chess
88
  - huggingface_hub
89
-
90
- Install with:
91
- ```powershell
92
- pip install torch numpy python-chess huggingface_hub
93
- ```
94
 
95
  ## Performance
96
 
97
  - Trained on millions of self-play games
98
  - Demonstrates tactical awareness comparable to strong human players
99
  - Capable of long-term strategic planning through MCTS
100
- - Robust performance across different game phases (opening, middlegame, endgame)
 
101
 
102
  ## Applications
103
 
104
  ### Research
105
  - Study emergent chess strategies through AI self-play
106
  - Analyze the effectiveness of different neural architectures for game playing
107
- - Experiment with diffusion-based exploration in strategic games
108
 
109
  ### Education
110
  - Chess training partner with adjustable difficulty
@@ -118,11 +127,12 @@ pip install torch numpy python-chess huggingface_hub
118
 
119
  ## Technical Specifications
120
 
121
- - **Framework**: PyTorch with custom MCTS implementation
122
  - **Training Method**: Self-play reinforcement learning
123
  - **Architecture**: Deep residual neural network
124
  - **Search Algorithm**: Monte Carlo Tree Search
125
  - **Evaluation**: Combined policy and value head outputs
 
126
 
127
  ## Citation
128
 
 
20
 
21
  # o1 Chess Agent
22
 
23
+ **A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion**
24
 
25
  ![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
26
  *Image by Sora, OpenAI*
27
 
28
  ## Model Description
29
 
30
+ o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.
31
 
32
  ## Key Features
33
 
34
  ### 🧠 **Advanced Architecture**
35
+ - **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition
36
+ - **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures
37
+ - **Denoising Diffusion**: Backward (reverse) diffusion process for robust move policy
38
+ - **Dual-Color Training**: Learns optimal play as both white and black
39
 
40
  ### 🔄 **Self-Play Learning**
41
+ - **Continuous Self-Improvement**: Learns by playing against itself
42
+ - **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
43
+ - **Move Limit Penalty**: Extra penalty if the game ends by move limit
44
+ - **Experience Replay**: Trains from a buffer of past games
45
 
46
  ### 🎯 **Experimental Features**
47
+ - **Diffusion-Based Exploration**: Denoising process for creative and robust move selection
48
+ - **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates
49
+ - **No Handcrafted Heuristics**: All strategy is learned, not hardcoded
50
 
51
  ## Architecture Details
52
 
53
  o1 combines several cutting-edge techniques:
 
54
  - **Policy-Value Network**: Simultaneous move prediction and position evaluation
55
  - **Residual Connections**: Deep architecture with skip connections for stable training
56
  - **MCTS Integration**: Tree search guided by neural network evaluations
57
  - **Self-Play Pipeline**: Automated training through continuous game generation
58
+ - **Material Reward System**: Rewards/penalties for capturing or losing pieces
59
 
60
  ## Usage: Play Against o1
61
 
 
83
  print("Value:", value.item())
84
  ```
85
 
86
+ ## Try it in your browser
87
+
88
+ A Streamlit web app is included for drag-and-drop chess play against o1. You can deploy this as a Hugging Face Space or run locally:
89
+
90
+ ```powershell
91
+ pip install streamlit python-chess torch huggingface_hub
92
+ streamlit run app.py
93
+ ```
94
+
95
  ## Requirements
96
 
97
  - torch
98
  - numpy
99
  - python-chess
100
  - huggingface_hub
101
+ - streamlit (for web UI)
 
 
 
 
102
 
103
  ## Performance
104
 
105
  - Trained on millions of self-play games
106
  - Demonstrates tactical awareness comparable to strong human players
107
  - Capable of long-term strategic planning through MCTS
108
+ - Robust performance across all game phases
109
+ - Learns to value material and avoid stalling
110
 
111
  ## Applications
112
 
113
  ### Research
114
  - Study emergent chess strategies through AI self-play
115
  - Analyze the effectiveness of different neural architectures for game playing
116
+ - Experiment with denoising diffusion in strategic games
117
 
118
  ### Education
119
  - Chess training partner with adjustable difficulty
 
127
 
128
  ## Technical Specifications
129
 
130
+ - **Framework**: PyTorch with custom MCTS and denoising diffusion
131
  - **Training Method**: Self-play reinforcement learning
132
  - **Architecture**: Deep residual neural network
133
  - **Search Algorithm**: Monte Carlo Tree Search
134
  - **Evaluation**: Combined policy and value head outputs
135
+ - **Reward System**: Game result, material, and move limit penalties
136
 
137
  ## Citation
138