|
# Local LLM Clue Generation Prototype |
|
|
|
This prototype integrates the existing thematic word generation with local LLM-based clue generation using `google/flan-t5-small`. |
|
|
|
## Files |
|
|
|
- **`llm_clue_generator.py`** - Core LLM clue generator using flan-t5-small |
|
- **`test_clue_generation.py`** - Integration test script combining word + clue generation |
|
- **`requirements.txt`** - Dependencies for the prototype |
|
- **`README_clue_generation.md`** - This documentation |
|
|
|
## Quick Start |
|
|
|
1. **Install dependencies:** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
2. **Test LLM clue generator only:** |
|
```bash |
|
python llm_clue_generator.py |
|
``` |
|
|
|
3. **Test full integration (word + clue generation):** |
|
```bash |
|
python test_clue_generation.py |
|
``` |
|
|
|
## Key Features |
|
|
|
### LLM Clue Generator (`llm_clue_generator.py`) |
|
- Uses `google/flan-t5-small` (~250MB) optimized for CPU inference |
|
- Generates multiple clue candidates and selects the best one |
|
- Supports different clue styles: definition, trivia, description, category |
|
- Includes fallback templates when LLM generation fails |
|
- Batch processing capability for efficiency |
|
|
|
### Integration Test (`test_clue_generation.py`) |
|
- **Single Topic Test**: Generate words + clues for one topic |
|
- **Multi-Topic Test**: Handle multiple themes with contextual clues |
|
- **Custom Sentence Test**: Personal sentence to themed word-clue pairs |
|
- **Difficulty Comparison**: Same words with easy/medium/hard clue complexity |
|
- **Performance Analysis**: Speed and memory usage metrics |
|
|
|
## Expected Performance (HF Spaces) |
|
|
|
- **Initialization**: ~30-60s (model download + word embeddings) |
|
- **Word Generation**: ~1-3s for 10 words |
|
- **Clue Generation**: ~2-5s per clue (depends on complexity) |
|
- **Memory Usage**: ~1-2GB (model + embeddings + vocabulary) |
|
|
|
## Sample Output |
|
|
|
``` |
|
Topic: 'animals' |
|
1. ELEPHANT (8 letters) - Large mammal with trunk and tusks |
|
2. TIGER (5 letters) - Striped big cat from Asia |
|
3. PENGUIN (7 letters) - Flightless Antarctic bird |
|
... |
|
``` |
|
|
|
## Integration with Backend |
|
|
|
To integrate with the main crossword application: |
|
|
|
1. **Add to ThematicWordService**: Include LLMClueGenerator as optional component |
|
2. **Async Support**: Wrap clue generation in async methods |
|
3. **Caching**: Cache generated clues to avoid regeneration |
|
4. **Fallback Chain**: LLM → Enhanced Templates → Basic Templates |
|
|
|
## Configuration Options |
|
|
|
### LLM Settings |
|
- `model_name`: Change model (default: "google/flan-t5-small") |
|
- `max_length`: Maximum clue length (default: 50) |
|
- `temperature`: Generation creativity (default: 0.7) |
|
- `num_candidates`: Clue candidates to generate (default: 3) |
|
|
|
### Performance Tuning |
|
- `cache_dir`: Model cache location |
|
- `batch_size`: For batch processing |
|
- `device`: CPU (-1) or GPU (0, 1, ...) |
|
|
|
## Troubleshooting |
|
|
|
### Common Issues |
|
|
|
1. **"transformers not available"** |
|
- Install: `pip install transformers torch` |
|
|
|
2. **"Model download failed"** |
|
- Check internet connection |
|
- Verify cache directory permissions |
|
- Try: `huggingface_hub.snapshot_download('google/flan-t5-small')` |
|
|
|
3. **"Out of memory"** |
|
- Reduce vocabulary size in thematic generator |
|
- Use smaller batch sizes |
|
- Consider model quantization |
|
|
|
4. **Slow generation** |
|
- First run downloads model (~250MB) |
|
- Subsequent runs use cached model |
|
- CPU inference is slower than GPU but more compatible |
|
|
|
## Production Considerations |
|
|
|
### For Hugging Face Spaces |
|
- ✅ Model size (~250MB) fits in HF Spaces |
|
- ✅ CPU-only inference supported |
|
- ✅ No external API dependencies |
|
- ⚠️ Startup time includes model download |
|
- ⚠️ Generation time may be noticeable in UI |
|
|
|
### Recommendations |
|
1. **Preload models** during app startup |
|
2. **Cache clues** aggressively to avoid regeneration |
|
3. **Show loading indicators** during clue generation |
|
4. **Implement timeouts** for clue generation (fallback to templates) |
|
5. **Consider async processing** for better UX |
|
|
|
## Alternative Models |
|
|
|
If `flan-t5-small` doesn't meet requirements: |
|
|
|
- **Smaller**: `distilgpt2` (~320MB, faster but lower quality) |
|
- **Larger**: `google/flan-t5-base` (~850MB, better quality but slower) |
|
- **Specialized**: `microsoft/DialoGPT-small` (~350MB, conversational style) |
|
|
|
## Next Steps |
|
|
|
1. Run tests to evaluate performance on your hardware |
|
2. Compare clue quality with existing template system |
|
3. Measure actual memory usage in HF Spaces environment |
|
4. Integrate with main crossword application if results are satisfactory |