Hangman DQN Model: Hangman DQN Baseline
A Deep Q-Network (DQN) model trained to play Hangman using a Dueling Q-Network architecture with Transformer encoder.
Model Description
This model uses a Dueling Q-Network architecture with:
- Transformer Encoder: 2 layers, 4 heads, 128 model dimension
- Dueling Architecture: Separate value and advantage streams
- Input Features: Word pattern, tried letters, word length, priors
- Output: Q-values for 26 possible letter choices
Performance
Current Model Performance
- Win Rate: 0.010
- Best Baseline: cand (0.870)
- Performance Gap: 0.860
Baseline Comparison
Strategy | Win Rate | Description |
---|---|---|
CAND | 0.866 | Candidate filtering (optimal) |
IGX | 0.127 | Information gain exact |
POS | 0.160 | Positional priors |
LEN | 0.112 | Length-based priors |
IG | 0.100 | Information gain |
Model Architecture
DuelingQNet(
d_model=128,
nhead=4,
nlayers=2,
ff_mult=4,
max_len=35,
dropout=0.1
)
Usage
import torch
from hangman.rl.models import DuelingQNet
# Load model
model = DuelingQNet(d_model=128, nhead=4, nlayers=2, ff_mult=4, max_len=35, dropout=0.1)
checkpoint = torch.load('bc_dueling_qnet.pt', map_location='cpu')
model.load_state_dict(checkpoint['model'])
# Use with hangman environment
# (See simulator_playground.py for complete usage)
Training Details
- Teacher Strategy: IGX (Information Gain Exact)
- Training Method: Behavior Cloning
- Dataset: 227,300 words from training dictionary
- Word Lengths: 4-12 characters
- Max Tries: 6
Evaluation
The model is evaluated using the simulator_playground.py
script which provides:
- Performance comparison with heuristic baselines
- Individual game decision analysis
- Q-value pattern analysis
- Comprehensive reporting
Files
bc_dueling_qnet.pt
: PyTorch model checkpointsimulator_playground.py
: Evaluation and analysis tooltraining_monitor.py
: Training progress monitoringperformance_report.md
: Detailed performance analysis
Limitations
- Current Performance: Significantly underperforms optimal CAND strategy
- Training Data: Limited to 1,000 seeding episodes
- Teacher Strategy: Uses suboptimal IGX instead of CAND
- Architecture: May need enhancement for better performance
Future Improvements
- Retrain with CAND Teacher: Use 86.6% win rate CAND strategy as teacher
- Enhanced Architecture: Add candidate priors and better feature fusion
- More Training Data: Increase seeding episodes and success rate
- Curriculum Learning: Progressive difficulty training
Citation
@misc{hangman-dqn-hangman dqn baseline,
title={Hangman DQN: Deep Q-Network for Hangman Game},
author={Your Name},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/your-username/hangman dqn baseline}}
}
License
MIT License - see LICENSE file for details.