--- license: apache-2.0 model_name: o1 repo: FlameF0X/o1 pipeline_tag: reinforcement-learning tags: - chess - reinforcement-learning - mcts - diffusion hybrid - alpha-zero-style - deep-learning - monte-carlo-tree-search - self-play - alphazero - research - games new_version: FlameF0X/o2 --- # o1 Chess Agent **A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion** ![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp) *Image by Sora, OpenAI* ## Model Description o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling. ## Key Features ### 🧠 **Advanced Architecture** - **Hybrid Diffusion + Transformer Model**: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning - **Deep Residual Neural Network**: 20+ residual blocks for strong pattern recognition - **Monte Carlo Tree Search (MCTS)**: Looks ahead by simulating many possible futures - **Denoising Diffusion**: Backward (reverse) diffusion process for creative move selection - **Dual-Color Training**: Learns optimal play as both white and black ### 🔄 **Self-Play Learning** - **Continuous Self-Improvement**: Learns by playing against itself - **Balanced Reward System**: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one) - **Move Limit Penalty**: Extra penalty if the game ends by move limit - **Experience Replay**: Trains from a buffer of past games - **Early Stopping**: Prevents overfitting during training ### 🎯 **Experimental & Evaluation Features** - **Diffusion-Based Exploration**: Denoising process for creative and robust move selection - **Transformer Block**: Integrates a transformer encoder for global board context - **ELO Evaluation**: Built-in script to estimate model strength via ELO rating - **Top-N Move Restriction**: Optionally restricts move selection to the top-N MCTS candidates - **No Handcrafted Heuristics**: All strategy is learned, not hardcoded - **Advanced Board/Move Encoding**: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding ### 🖥️ **Modern UI** - **Streamlit Web App**: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI ## Architecture Details o1 now features a hybrid neural architecture: - **Hybrid Policy-Value Network**: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection - **Advanced Board Representation**: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule - **MCTS Integration**: Tree search guided by neural network evaluations - **Self-Play Pipeline**: Automated training through continuous game generation - **Material Reward System**: Rewards/penalties for capturing or losing pieces - **ELO Evaluation**: Play games between agents and estimate ELO rating ## Usage: Play Against o1 You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app: ```python ``` Or launch the web UI: ```powershell pip install streamlit python-chess torch huggingface_hub streamlit run app.py ``` ## Requirements - torch - numpy - python-chess - huggingface_hub - streamlit (for web UI) ## Performance - Trained on millions of self-play games - Demonstrates tactical awareness comparable to strong human players - Capable of long-term strategic planning through MCTS - Robust performance across all game phases - Learns to value material and avoid stalling - Estimated ELO of 257.3 ## Applications ### Research - Study emergent chess strategies through AI self-play - Analyze the effectiveness of different neural architectures for game playing - Experiment with denoising diffusion in strategic games ### Education - Chess training partner with adjustable difficulty - Analysis tool for game improvement - Demonstration of reinforcement learning principles ### Competition - Tournament-ready chess engine - Benchmark for other chess AI systems - Platform for testing new chess AI techniques ## Technical Specifications - **Framework**: PyTorch with custom MCTS, hybrid diffusion+transformer model - **Training Method**: Self-play reinforcement learning with early stopping - **Architecture**: Residual + Transformer + Diffusion - **Search Algorithm**: Monte Carlo Tree Search - **Evaluation**: Combined policy and value head outputs, ELO script - **Reward System**: Game result, material, and move limit penalties - **Board/Move Encoding**: 20-channel board tensor, robust move encoding/decoding - **UI**: Streamlit drag-and-drop web app ## Citation If you use o1 Chess Agent in your research, please cite: ```bibtex @misc{o1-chess-agent, title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play}, author={FlameF0X}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/FlameF0X/o1}} } ```