Local LLM Clue Generation Prototype
This prototype integrates the existing thematic word generation with local LLM-based clue generation using google/flan-t5-small
.
Files
llm_clue_generator.py
- Core LLM clue generator using flan-t5-smalltest_clue_generation.py
- Integration test script combining word + clue generationrequirements.txt
- Dependencies for the prototypeREADME_clue_generation.md
- This documentation
Quick Start
Install dependencies:
pip install -r requirements.txt
Test LLM clue generator only:
python llm_clue_generator.py
Test full integration (word + clue generation):
python test_clue_generation.py
Key Features
LLM Clue Generator (llm_clue_generator.py
)
- Uses
google/flan-t5-small
(~250MB) optimized for CPU inference - Generates multiple clue candidates and selects the best one
- Supports different clue styles: definition, trivia, description, category
- Includes fallback templates when LLM generation fails
- Batch processing capability for efficiency
Integration Test (test_clue_generation.py
)
- Single Topic Test: Generate words + clues for one topic
- Multi-Topic Test: Handle multiple themes with contextual clues
- Custom Sentence Test: Personal sentence to themed word-clue pairs
- Difficulty Comparison: Same words with easy/medium/hard clue complexity
- Performance Analysis: Speed and memory usage metrics
Expected Performance (HF Spaces)
- Initialization: ~30-60s (model download + word embeddings)
- Word Generation: ~1-3s for 10 words
- Clue Generation: ~2-5s per clue (depends on complexity)
- Memory Usage: ~1-2GB (model + embeddings + vocabulary)
Sample Output
Topic: 'animals'
1. ELEPHANT (8 letters) - Large mammal with trunk and tusks
2. TIGER (5 letters) - Striped big cat from Asia
3. PENGUIN (7 letters) - Flightless Antarctic bird
...
Integration with Backend
To integrate with the main crossword application:
- Add to ThematicWordService: Include LLMClueGenerator as optional component
- Async Support: Wrap clue generation in async methods
- Caching: Cache generated clues to avoid regeneration
- Fallback Chain: LLM → Enhanced Templates → Basic Templates
Configuration Options
LLM Settings
model_name
: Change model (default: "google/flan-t5-small")max_length
: Maximum clue length (default: 50)temperature
: Generation creativity (default: 0.7)num_candidates
: Clue candidates to generate (default: 3)
Performance Tuning
cache_dir
: Model cache locationbatch_size
: For batch processingdevice
: CPU (-1) or GPU (0, 1, ...)
Troubleshooting
Common Issues
"transformers not available"
- Install:
pip install transformers torch
- Install:
"Model download failed"
- Check internet connection
- Verify cache directory permissions
- Try:
huggingface_hub.snapshot_download('google/flan-t5-small')
"Out of memory"
- Reduce vocabulary size in thematic generator
- Use smaller batch sizes
- Consider model quantization
Slow generation
- First run downloads model (~250MB)
- Subsequent runs use cached model
- CPU inference is slower than GPU but more compatible
Production Considerations
For Hugging Face Spaces
- ✅ Model size (~250MB) fits in HF Spaces
- ✅ CPU-only inference supported
- ✅ No external API dependencies
- ⚠️ Startup time includes model download
- ⚠️ Generation time may be noticeable in UI
Recommendations
- Preload models during app startup
- Cache clues aggressively to avoid regeneration
- Show loading indicators during clue generation
- Implement timeouts for clue generation (fallback to templates)
- Consider async processing for better UX
Alternative Models
If flan-t5-small
doesn't meet requirements:
- Smaller:
distilgpt2
(~320MB, faster but lower quality) - Larger:
google/flan-t5-base
(~850MB, better quality but slower) - Specialized:
microsoft/DialoGPT-small
(~350MB, conversational style)
Next Steps
- Run tests to evaluate performance on your hardware
- Compare clue quality with existing template system
- Measure actual memory usage in HF Spaces environment
- Integrate with main crossword application if results are satisfactory