o1 / README.md

Update README.md

015e725 verified about 1 month ago

5.42 kB

	---
	license: apache-2.0
	model_name: o1
	repo: FlameF0X/o1
	pipeline_tag: reinforcement-learning
	tags:
	- chess
	- reinforcement-learning
	- mcts
	- diffusion hybrid
	- alpha-zero-style
	- deep-learning
	- monte-carlo-tree-search
	- self-play
	- alphazero
	- research
	- games
	new_version: FlameF0X/o2
	---

	# o1 Chess Agent

	A state-of-the-art chess AI powered by deep reinforcement learning, MCTS, and denoising diffusion

	![Chess Agent Demo](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/aUgmFN9FHU_7GU25vfLSb.webp)
	Image by Sora, OpenAI

	## Model Description

	o1 is a modern chess agent that combines a deep residual neural network, Monte Carlo Tree Search (MCTS), and a denoising diffusion process for move selection. The model learns entirely by self-play, switching between black and white, and is trained with a sophisticated reward system that includes game outcome, material balance, and penalties for stalling.

	## Key Features

	### 🧠 Advanced Architecture
	- Hybrid Diffusion + Transformer Model: Combines deep residual blocks, a transformer encoder, and a denoising diffusion process for robust move policy and long-range reasoning
	- Deep Residual Neural Network: 20+ residual blocks for strong pattern recognition
	- Monte Carlo Tree Search (MCTS): Looks ahead by simulating many possible futures
	- Denoising Diffusion: Backward (reverse) diffusion process for creative move selection
	- Dual-Color Training: Learns optimal play as both white and black

	### 🔄 Self-Play Learning
	- Continuous Self-Improvement: Learns by playing against itself
	- Balanced Reward System: Rewards for wins, penalties for losses, and material-based rewards (e.g., +9 for capturing a queen, -9 for losing one)
	- Move Limit Penalty: Extra penalty if the game ends by move limit
	- Experience Replay: Trains from a buffer of past games
	- Early Stopping: Prevents overfitting during training

	### 🎯 Experimental & Evaluation Features
	- Diffusion-Based Exploration: Denoising process for creative and robust move selection
	- Transformer Block: Integrates a transformer encoder for global board context
	- ELO Evaluation: Built-in script to estimate model strength via ELO rating
	- Top-N Move Restriction: Optionally restricts move selection to the top-N MCTS candidates
	- No Handcrafted Heuristics: All strategy is learned, not hardcoded
	- Advanced Board/Move Encoding: Includes en passant, repetition, 50-move rule, and robust move encoding/decoding

	### 🖥️ Modern UI
	- Streamlit Web App: Drag-and-drop chess play against the agent, with Hugging Face Hub model loading and browser UI

	## Architecture Details

	o1 now features a hybrid neural architecture:
	- Hybrid Policy-Value Network: Residual blocks for local patterns, transformer encoder for global context, and a denoising diffusion process for robust move selection
	- Advanced Board Representation: 20 input channels including piece types, castling rights, move count, en passant, repetition, and 50-move rule
	- MCTS Integration: Tree search guided by neural network evaluations
	- Self-Play Pipeline: Automated training through continuous game generation
	- Material Reward System: Rewards/penalties for capturing or losing pieces
	- ELO Evaluation: Play games between agents and estimate ELO rating

	## Usage: Play Against o1

	You can load the model and play against o1 directly from the Hugging Face Hub, or use the Streamlit web app:

	```python

	```

	Or launch the web UI:

	```powershell
	pip install streamlit python-chess torch huggingface_hub
	streamlit run app.py
	```

	## Requirements

	- torch
	- numpy
	- python-chess
	- huggingface_hub
	- streamlit (for web UI)

	## Performance

	- Trained on millions of self-play games
	- Demonstrates tactical awareness comparable to strong human players
	- Capable of long-term strategic planning through MCTS
	- Robust performance across all game phases
	- Learns to value material and avoid stalling
	- Estimated ELO of 257.3
	## Applications

	### Research
	- Study emergent chess strategies through AI self-play
	- Analyze the effectiveness of different neural architectures for game playing
	- Experiment with denoising diffusion in strategic games

	### Education
	- Chess training partner with adjustable difficulty
	- Analysis tool for game improvement
	- Demonstration of reinforcement learning principles

	### Competition
	- Tournament-ready chess engine
	- Benchmark for other chess AI systems
	- Platform for testing new chess AI techniques

	## Technical Specifications

	- Framework: PyTorch with custom MCTS, hybrid diffusion+transformer model
	- Training Method: Self-play reinforcement learning with early stopping
	- Architecture: Residual + Transformer + Diffusion
	- Search Algorithm: Monte Carlo Tree Search
	- Evaluation: Combined policy and value head outputs, ELO script
	- Reward System: Game result, material, and move limit penalties
	- Board/Move Encoding: 20-channel board tensor, robust move encoding/decoding
	- UI: Streamlit drag-and-drop web app

	## Citation

	If you use o1 Chess Agent in your research, please cite:

	```bibtex
	@misc{o1-chess-agent,
	title={o1 Chess Agent: Deep Reinforcement Learning for Strategic Game Play},
	author={FlameF0X},
	year={2024},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/FlameF0X/o1}}
	}
	```