Spaces:

vimalk78
/

abc123

Running

File size: 4,452 Bytes

486eff6

# Local LLM Clue Generation Prototype

This prototype integrates the existing thematic word generation with local LLM-based clue generation using `google/flan-t5-small`.

## Files

- **`llm_clue_generator.py`** - Core LLM clue generator using flan-t5-small
- **`test_clue_generation.py`** - Integration test script combining word + clue generation
- **`requirements.txt`** - Dependencies for the prototype
- **`README_clue_generation.md`** - This documentation

## Quick Start

1. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

2. **Test LLM clue generator only:**
   ```bash
   python llm_clue_generator.py
   ```

3. **Test full integration (word + clue generation):**
   ```bash
   python test_clue_generation.py
   ```

## Key Features

### LLM Clue Generator (`llm_clue_generator.py`)
- Uses `google/flan-t5-small` (~250MB) optimized for CPU inference
- Generates multiple clue candidates and selects the best one
- Supports different clue styles: definition, trivia, description, category
- Includes fallback templates when LLM generation fails
- Batch processing capability for efficiency

### Integration Test (`test_clue_generation.py`)
- **Single Topic Test**: Generate words + clues for one topic
- **Multi-Topic Test**: Handle multiple themes with contextual clues
- **Custom Sentence Test**: Personal sentence to themed word-clue pairs
- **Difficulty Comparison**: Same words with easy/medium/hard clue complexity
- **Performance Analysis**: Speed and memory usage metrics

## Expected Performance (HF Spaces)

- **Initialization**: ~30-60s (model download + word embeddings)
- **Word Generation**: ~1-3s for 10 words
- **Clue Generation**: ~2-5s per clue (depends on complexity)
- **Memory Usage**: ~1-2GB (model + embeddings + vocabulary)

## Sample Output

```
Topic: 'animals'
1. ELEPHANT    (8 letters) - Large mammal with trunk and tusks
2. TIGER       (5 letters) - Striped big cat from Asia
3. PENGUIN     (7 letters) - Flightless Antarctic bird
...
```

## Integration with Backend

To integrate with the main crossword application:

1. **Add to ThematicWordService**: Include LLMClueGenerator as optional component
2. **Async Support**: Wrap clue generation in async methods
3. **Caching**: Cache generated clues to avoid regeneration
4. **Fallback Chain**: LLM → Enhanced Templates → Basic Templates

## Configuration Options

### LLM Settings
- `model_name`: Change model (default: "google/flan-t5-small")
- `max_length`: Maximum clue length (default: 50)
- `temperature`: Generation creativity (default: 0.7)
- `num_candidates`: Clue candidates to generate (default: 3)

### Performance Tuning
- `cache_dir`: Model cache location
- `batch_size`: For batch processing
- `device`: CPU (-1) or GPU (0, 1, ...)

## Troubleshooting

### Common Issues

1. **"transformers not available"**
   - Install: `pip install transformers torch`

2. **"Model download failed"**
   - Check internet connection
   - Verify cache directory permissions
   - Try: `huggingface_hub.snapshot_download('google/flan-t5-small')`

3. **"Out of memory"**
   - Reduce vocabulary size in thematic generator
   - Use smaller batch sizes
   - Consider model quantization

4. **Slow generation**
   - First run downloads model (~250MB)
   - Subsequent runs use cached model
   - CPU inference is slower than GPU but more compatible

## Production Considerations

### For Hugging Face Spaces
- ✅ Model size (~250MB) fits in HF Spaces
- ✅ CPU-only inference supported
- ✅ No external API dependencies
- ⚠️ Startup time includes model download
- ⚠️ Generation time may be noticeable in UI

### Recommendations
1. **Preload models** during app startup
2. **Cache clues** aggressively to avoid regeneration
3. **Show loading indicators** during clue generation
4. **Implement timeouts** for clue generation (fallback to templates)
5. **Consider async processing** for better UX

## Alternative Models

If `flan-t5-small` doesn't meet requirements:

- **Smaller**: `distilgpt2` (~320MB, faster but lower quality)
- **Larger**: `google/flan-t5-base` (~850MB, better quality but slower)
- **Specialized**: `microsoft/DialoGPT-small` (~350MB, conversational style)

## Next Steps

1. Run tests to evaluate performance on your hardware
2. Compare clue quality with existing template system
3. Measure actual memory usage in HF Spaces environment
4. Integrate with main crossword application if results are satisfactory