Spaces:

vimalk78
/

abc123

Running

App Files Files Community

abc123 / hack /README_clue_generation.md

vimalk78

feat(crossword): generated crosswords with clues

486eff6 23 days ago

preview code

raw

history blame contribute delete

4.45 kB

	# Local LLM Clue Generation Prototype

	This prototype integrates the existing thematic word generation with local LLM-based clue generation using `google/flan-t5-small`.

	## Files

	- `llm_clue_generator.py` - Core LLM clue generator using flan-t5-small
	- `test_clue_generation.py` - Integration test script combining word + clue generation
	- `requirements.txt` - Dependencies for the prototype
	- `README_clue_generation.md` - This documentation

	## Quick Start

	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Test LLM clue generator only:
	```bash
	python llm_clue_generator.py
	```

	3. Test full integration (word + clue generation):
	```bash
	python test_clue_generation.py
	```

	## Key Features

	### LLM Clue Generator (`llm_clue_generator.py`)
	- Uses `google/flan-t5-small` (~250MB) optimized for CPU inference
	- Generates multiple clue candidates and selects the best one
	- Supports different clue styles: definition, trivia, description, category
	- Includes fallback templates when LLM generation fails
	- Batch processing capability for efficiency

	### Integration Test (`test_clue_generation.py`)
	- Single Topic Test: Generate words + clues for one topic
	- Multi-Topic Test: Handle multiple themes with contextual clues
	- Custom Sentence Test: Personal sentence to themed word-clue pairs
	- Difficulty Comparison: Same words with easy/medium/hard clue complexity
	- Performance Analysis: Speed and memory usage metrics

	## Expected Performance (HF Spaces)

	- Initialization: ~30-60s (model download + word embeddings)
	- Word Generation: ~1-3s for 10 words
	- Clue Generation: ~2-5s per clue (depends on complexity)
	- Memory Usage: ~1-2GB (model + embeddings + vocabulary)

	## Sample Output

	```
	Topic: 'animals'
	1. ELEPHANT (8 letters) - Large mammal with trunk and tusks
	2. TIGER (5 letters) - Striped big cat from Asia
	3. PENGUIN (7 letters) - Flightless Antarctic bird
	...
	```

	## Integration with Backend

	To integrate with the main crossword application:

	1. Add to ThematicWordService: Include LLMClueGenerator as optional component
	2. Async Support: Wrap clue generation in async methods
	3. Caching: Cache generated clues to avoid regeneration
	4. Fallback Chain: LLM → Enhanced Templates → Basic Templates

	## Configuration Options

	### LLM Settings
	- `model_name`: Change model (default: "google/flan-t5-small")
	- `max_length`: Maximum clue length (default: 50)
	- `temperature`: Generation creativity (default: 0.7)
	- `num_candidates`: Clue candidates to generate (default: 3)

	### Performance Tuning
	- `cache_dir`: Model cache location
	- `batch_size`: For batch processing
	- `device`: CPU (-1) or GPU (0, 1, ...)

	## Troubleshooting

	### Common Issues

	1. "transformers not available"
	- Install: `pip install transformers torch`

	2. "Model download failed"
	- Check internet connection
	- Verify cache directory permissions
	- Try: `huggingface_hub.snapshot_download('google/flan-t5-small')`

	3. "Out of memory"
	- Reduce vocabulary size in thematic generator
	- Use smaller batch sizes
	- Consider model quantization

	4. Slow generation
	- First run downloads model (~250MB)
	- Subsequent runs use cached model
	- CPU inference is slower than GPU but more compatible

	## Production Considerations

	### For Hugging Face Spaces
	- ✅ Model size (~250MB) fits in HF Spaces
	- ✅ CPU-only inference supported
	- ✅ No external API dependencies
	- ⚠️ Startup time includes model download
	- ⚠️ Generation time may be noticeable in UI

	### Recommendations
	1. Preload models during app startup
	2. Cache clues aggressively to avoid regeneration
	3. Show loading indicators during clue generation
	4. Implement timeouts for clue generation (fallback to templates)
	5. Consider async processing for better UX

	## Alternative Models

	If `flan-t5-small` doesn't meet requirements:

	- Smaller: `distilgpt2` (~320MB, faster but lower quality)
	- Larger: `google/flan-t5-base` (~850MB, better quality but slower)
	- Specialized: `microsoft/DialoGPT-small` (~350MB, conversational style)

	## Next Steps

	1. Run tests to evaluate performance on your hardware
	2. Compare clue quality with existing template system
	3. Measure actual memory usage in HF Spaces environment
	4. Integrate with main crossword application if results are satisfactory