Hangman DQN Model: Hangman DQN Baseline

A Deep Q-Network (DQN) model trained to play Hangman using a Dueling Q-Network architecture with Transformer encoder.

Model Description

This model uses a Dueling Q-Network architecture with:

  • Transformer Encoder: 2 layers, 4 heads, 128 model dimension
  • Dueling Architecture: Separate value and advantage streams
  • Input Features: Word pattern, tried letters, word length, priors
  • Output: Q-values for 26 possible letter choices

Performance

Current Model Performance

  • Win Rate: 0.010
  • Best Baseline: cand (0.870)
  • Performance Gap: 0.860

Baseline Comparison

Strategy Win Rate Description
CAND 0.866 Candidate filtering (optimal)
IGX 0.127 Information gain exact
POS 0.160 Positional priors
LEN 0.112 Length-based priors
IG 0.100 Information gain

Model Architecture

DuelingQNet(
    d_model=128,
    nhead=4,
    nlayers=2,
    ff_mult=4,
    max_len=35,
    dropout=0.1
)

Usage

import torch
from hangman.rl.models import DuelingQNet

# Load model
model = DuelingQNet(d_model=128, nhead=4, nlayers=2, ff_mult=4, max_len=35, dropout=0.1)
checkpoint = torch.load('bc_dueling_qnet.pt', map_location='cpu')
model.load_state_dict(checkpoint['model'])

# Use with hangman environment
# (See simulator_playground.py for complete usage)

Training Details

  • Teacher Strategy: IGX (Information Gain Exact)
  • Training Method: Behavior Cloning
  • Dataset: 227,300 words from training dictionary
  • Word Lengths: 4-12 characters
  • Max Tries: 6

Evaluation

The model is evaluated using the simulator_playground.py script which provides:

  • Performance comparison with heuristic baselines
  • Individual game decision analysis
  • Q-value pattern analysis
  • Comprehensive reporting

Files

  • bc_dueling_qnet.pt: PyTorch model checkpoint
  • simulator_playground.py: Evaluation and analysis tool
  • training_monitor.py: Training progress monitoring
  • performance_report.md: Detailed performance analysis

Limitations

  • Current Performance: Significantly underperforms optimal CAND strategy
  • Training Data: Limited to 1,000 seeding episodes
  • Teacher Strategy: Uses suboptimal IGX instead of CAND
  • Architecture: May need enhancement for better performance

Future Improvements

  1. Retrain with CAND Teacher: Use 86.6% win rate CAND strategy as teacher
  2. Enhanced Architecture: Add candidate priors and better feature fusion
  3. More Training Data: Increase seeding episodes and success rate
  4. Curriculum Learning: Progressive difficulty training

Citation

@misc{hangman-dqn-hangman dqn baseline,
  title={Hangman DQN: Deep Q-Network for Hangman Game},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/your-username/hangman dqn baseline}}
}

License

MIT License - see LICENSE file for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support