Local LLM Clue Generation Prototype

This prototype integrates the existing thematic word generation with local LLM-based clue generation using google/flan-t5-small.

Files

llm_clue_generator.py - Core LLM clue generator using flan-t5-small
test_clue_generation.py - Integration test script combining word + clue generation
requirements.txt - Dependencies for the prototype
README_clue_generation.md - This documentation

Quick Start

Install dependencies:
```
pip install -r requirements.txt
```
Test LLM clue generator only:
```
python llm_clue_generator.py
```
Test full integration (word + clue generation):
```
python test_clue_generation.py
```

Key Features

LLM Clue Generator (`llm_clue_generator.py`)

Uses google/flan-t5-small (~250MB) optimized for CPU inference
Generates multiple clue candidates and selects the best one
Supports different clue styles: definition, trivia, description, category
Includes fallback templates when LLM generation fails
Batch processing capability for efficiency

Integration Test (`test_clue_generation.py`)

Single Topic Test: Generate words + clues for one topic
Multi-Topic Test: Handle multiple themes with contextual clues
Custom Sentence Test: Personal sentence to themed word-clue pairs
Difficulty Comparison: Same words with easy/medium/hard clue complexity
Performance Analysis: Speed and memory usage metrics

Expected Performance (HF Spaces)

Initialization: ~30-60s (model download + word embeddings)
Word Generation: ~1-3s for 10 words
Clue Generation: ~2-5s per clue (depends on complexity)
Memory Usage: ~1-2GB (model + embeddings + vocabulary)

Sample Output

Topic: 'animals'
1. ELEPHANT    (8 letters) - Large mammal with trunk and tusks
2. TIGER       (5 letters) - Striped big cat from Asia
3. PENGUIN     (7 letters) - Flightless Antarctic bird
...

Integration with Backend

To integrate with the main crossword application:

Add to ThematicWordService: Include LLMClueGenerator as optional component
Async Support: Wrap clue generation in async methods
Caching: Cache generated clues to avoid regeneration
Fallback Chain: LLM → Enhanced Templates → Basic Templates

Configuration Options

LLM Settings

model_name: Change model (default: "google/flan-t5-small")
max_length: Maximum clue length (default: 50)
temperature: Generation creativity (default: 0.7)
num_candidates: Clue candidates to generate (default: 3)

Performance Tuning

cache_dir: Model cache location
batch_size: For batch processing
device: CPU (-1) or GPU (0, 1, ...)

Troubleshooting

Common Issues

"transformers not available"
- Install: pip install transformers torch
"Model download failed"
- Check internet connection
- Verify cache directory permissions
- Try: huggingface_hub.snapshot_download('google/flan-t5-small')
"Out of memory"
- Reduce vocabulary size in thematic generator
- Use smaller batch sizes
- Consider model quantization
Slow generation
- First run downloads model (~250MB)
- Subsequent runs use cached model
- CPU inference is slower than GPU but more compatible

Production Considerations

For Hugging Face Spaces

✅ Model size (~250MB) fits in HF Spaces
✅ CPU-only inference supported
✅ No external API dependencies
⚠️ Startup time includes model download
⚠️ Generation time may be noticeable in UI

Recommendations

Preload models during app startup
Cache clues aggressively to avoid regeneration
Show loading indicators during clue generation
Implement timeouts for clue generation (fallback to templates)
Consider async processing for better UX

Alternative Models

If flan-t5-small doesn't meet requirements:

Smaller: distilgpt2 (~320MB, faster but lower quality)
Larger: google/flan-t5-base (~850MB, better quality but slower)
Specialized: microsoft/DialoGPT-small (~350MB, conversational style)

Next Steps

Run tests to evaluate performance on your hardware
Compare clue quality with existing template system
Measure actual memory usage in HF Spaces environment
Integrate with main crossword application if results are satisfactory