Hangman DQN Model: Hangman DQN Baseline

A Deep Q-Network (DQN) model trained to play Hangman using a Dueling Q-Network architecture with Transformer encoder.

Model Description

This model uses a Dueling Q-Network architecture with:

Transformer Encoder: 2 layers, 4 heads, 128 model dimension
Dueling Architecture: Separate value and advantage streams
Input Features: Word pattern, tried letters, word length, priors
Output: Q-values for 26 possible letter choices

Performance

Current Model Performance

Win Rate: 0.010
Best Baseline: cand (0.870)
Performance Gap: 0.860

Baseline Comparison

Strategy	Win Rate	Description
CAND	0.866	Candidate filtering (optimal)
IGX	0.127	Information gain exact
POS	0.160	Positional priors
LEN	0.112	Length-based priors
IG	0.100	Information gain

Model Architecture

DuelingQNet(
    d_model=128,
    nhead=4,
    nlayers=2,
    ff_mult=4,
    max_len=35,
    dropout=0.1
)

Usage

import torch
from hangman.rl.models import DuelingQNet

# Load model
model = DuelingQNet(d_model=128, nhead=4, nlayers=2, ff_mult=4, max_len=35, dropout=0.1)
checkpoint = torch.load('bc_dueling_qnet.pt', map_location='cpu')
model.load_state_dict(checkpoint['model'])

# Use with hangman environment
# (See simulator_playground.py for complete usage)

Training Details

Teacher Strategy: IGX (Information Gain Exact)
Training Method: Behavior Cloning
Dataset: 227,300 words from training dictionary
Word Lengths: 4-12 characters
Max Tries: 6

Evaluation

The model is evaluated using the simulator_playground.py script which provides:

Performance comparison with heuristic baselines
Individual game decision analysis
Q-value pattern analysis
Comprehensive reporting

Files

bc_dueling_qnet.pt: PyTorch model checkpoint
simulator_playground.py: Evaluation and analysis tool
training_monitor.py: Training progress monitoring
performance_report.md: Detailed performance analysis

Limitations

Current Performance: Significantly underperforms optimal CAND strategy
Training Data: Limited to 1,000 seeding episodes
Teacher Strategy: Uses suboptimal IGX instead of CAND
Architecture: May need enhancement for better performance

Future Improvements

Retrain with CAND Teacher: Use 86.6% win rate CAND strategy as teacher
Enhanced Architecture: Add candidate priors and better feature fusion
More Training Data: Increase seeding episodes and success rate
Curriculum Learning: Progressive difficulty training

Citation

@misc{hangman-dqn-hangman dqn baseline,
  title={Hangman DQN: Deep Q-Network for Hangman Game},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/your-username/hangman dqn baseline}}
}

License

MIT License - see LICENSE file for details.