FlameF0X
/

o1

@@ -20,39 +20,42 @@ tags:
 # o1 Chess Agent
-**A state-of-the-art chess AI powered by deep reinforcement learning and self-play**
 ![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
 *Image by Sora, OpenAI*
 ## Model Description
-**o1** is a state-of-the-art chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and optional diffusion-based exploration. Through continuous self-play and deep reinforcement learning, o1 develops sophisticated chess strategies and tactical awareness. The model learns by self-play, switching between black and white, and is penalized for losses as either color. It is designed for research, experimentation, and competitive play.
 ## Key Features
 ### 🧠 **Advanced Architecture**
-- **Deep Residual Neural Network**: 10+ residual blocks inspired by AlphaZero for superior pattern recognition
-- **Monte Carlo Tree Search (MCTS)**: Strategic lookahead planning that evaluates millions of potential game states
-- **Dual-Color Training**: Learns optimal play from both white and black perspectives
 ### 🔄 **Self-Play Learning**
-- **Continuous Self-Improvement**: Battles against itself to discover new strategies and counter-strategies
-- **Balanced Reward System**: Penalized for losses as either color, ensuring robust performance across all game phases
-- **Experience Replay**: Learns from historical games to reinforce successful patterns
 ### 🎯 **Experimental Features**
-- **Diffusion-Based Exploration**: Optional diffusion process introduces creative variations in move selection
-- **Adaptive Strategy**: Adjusts playing style based on opponent patterns and game state
 ## Architecture Details
 o1 combines several cutting-edge techniques:
 - **Policy-Value Network**: Simultaneous move prediction and position evaluation
 - **Residual Connections**: Deep architecture with skip connections for stable training
 - **MCTS Integration**: Tree search guided by neural network evaluations
 - **Self-Play Pipeline**: Automated training through continuous game generation
 ## Usage: Play Against o1
@@ -80,31 +83,37 @@ print("Policy logits shape:", policy_logits.shape)
 print("Value:", value.item())
 ```
 ## Requirements
 - torch
 - numpy
 - python-chess
 - huggingface_hub
-Install with:
-```powershell
-pip install torch numpy python-chess huggingface_hub
-```
 ## Performance
 - Trained on millions of self-play games
 - Demonstrates tactical awareness comparable to strong human players
 - Capable of long-term strategic planning through MCTS
-- Robust performance across different game phases (opening, middlegame, endgame)
 ## Applications
 ### Research
 - Study emergent chess strategies through AI self-play
 - Analyze the effectiveness of different neural architectures for game playing
-- Experiment with diffusion-based exploration in strategic games
 ### Education
 - Chess training partner with adjustable difficulty
@@ -118,11 +127,12 @@ pip install torch numpy python-chess huggingface_hub
 ## Technical Specifications
-- **Framework**: PyTorch with custom MCTS implementation
 - **Training Method**: Self-play reinforcement learning
 - **Architecture**: Deep residual neural network
 - **Search Algorithm**: Monte Carlo Tree Search
 - **Evaluation**: Combined policy and value head outputs
 ## Citation

 # o1 Chess Agent
+**A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion**
 ![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
 *Image by Sora, OpenAI*
 ## Model Description
+o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.
 ## Key Features
 ### 🧠 **Advanced Architecture**
+- **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition
+- **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures
+- **Denoising Diffusion**: Backward (reverse) diffusion process for robust move policy
+- **Dual-Color Training**: Learns optimal play as both white and black
 ### 🔄 **Self-Play Learning**
+- **Continuous Self-Improvement**: Learns by playing against itself
+- **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
+- **Move Limit Penalty**: Extra penalty if the game ends by move limit
+- **Experience Replay**: Trains from a buffer of past games
 ### 🎯 **Experimental Features**
+- **Diffusion-Based Exploration**: Denoising process for creative and robust move selection
+- **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates
+- **No Handcrafted Heuristics**: All strategy is learned, not hardcoded
 ## Architecture Details
 o1 combines several cutting-edge techniques:
 - **Policy-Value Network**: Simultaneous move prediction and position evaluation
 - **Residual Connections**: Deep architecture with skip connections for stable training
 - **MCTS Integration**: Tree search guided by neural network evaluations
 - **Self-Play Pipeline**: Automated training through continuous game generation
+- **Material Reward System**: Rewards/penalties for capturing or losing pieces
 ## Usage: Play Against o1
 print("Value:", value.item())
 ```
+## Try it in your browser
+A Streamlit web app is included for drag-and-drop chess play against o1. You can deploy this as a Hugging Face Space or run locally:
+```powershell
+pip install streamlit python-chess torch huggingface_hub
+streamlit run app.py
+```
 ## Requirements
 - torch
 - numpy
 - python-chess
 - huggingface_hub
+- streamlit (for web UI)
 ## Performance
 - Trained on millions of self-play games
 - Demonstrates tactical awareness comparable to strong human players
 - Capable of long-term strategic planning through MCTS
+- Robust performance across all game phases
+- Learns to value material and avoid stalling
 ## Applications
 ### Research
 - Study emergent chess strategies through AI self-play
 - Analyze the effectiveness of different neural architectures for game playing
+- Experiment with denoising diffusion in strategic games
 ### Education
 - Chess training partner with adjustable difficulty
 ## Technical Specifications
+- **Framework**: PyTorch with custom MCTS and denoising diffusion
 - **Training Method**: Self-play reinforcement learning
 - **Architecture**: Deep residual neural network
 - **Search Algorithm**: Monte Carlo Tree Search
 - **Evaluation**: Combined policy and value head outputs
+- **Reward System**: Game result, material, and move limit penalties
 ## Citation