abc123 / hack /README_clue_generation.md
vimalk78's picture
feat(crossword): generated crosswords with clues
486eff6

Local LLM Clue Generation Prototype

This prototype integrates the existing thematic word generation with local LLM-based clue generation using google/flan-t5-small.

Files

  • llm_clue_generator.py - Core LLM clue generator using flan-t5-small
  • test_clue_generation.py - Integration test script combining word + clue generation
  • requirements.txt - Dependencies for the prototype
  • README_clue_generation.md - This documentation

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Test LLM clue generator only:

    python llm_clue_generator.py
    
  3. Test full integration (word + clue generation):

    python test_clue_generation.py
    

Key Features

LLM Clue Generator (llm_clue_generator.py)

  • Uses google/flan-t5-small (~250MB) optimized for CPU inference
  • Generates multiple clue candidates and selects the best one
  • Supports different clue styles: definition, trivia, description, category
  • Includes fallback templates when LLM generation fails
  • Batch processing capability for efficiency

Integration Test (test_clue_generation.py)

  • Single Topic Test: Generate words + clues for one topic
  • Multi-Topic Test: Handle multiple themes with contextual clues
  • Custom Sentence Test: Personal sentence to themed word-clue pairs
  • Difficulty Comparison: Same words with easy/medium/hard clue complexity
  • Performance Analysis: Speed and memory usage metrics

Expected Performance (HF Spaces)

  • Initialization: ~30-60s (model download + word embeddings)
  • Word Generation: ~1-3s for 10 words
  • Clue Generation: ~2-5s per clue (depends on complexity)
  • Memory Usage: ~1-2GB (model + embeddings + vocabulary)

Sample Output

Topic: 'animals'
1. ELEPHANT    (8 letters) - Large mammal with trunk and tusks
2. TIGER       (5 letters) - Striped big cat from Asia
3. PENGUIN     (7 letters) - Flightless Antarctic bird
...

Integration with Backend

To integrate with the main crossword application:

  1. Add to ThematicWordService: Include LLMClueGenerator as optional component
  2. Async Support: Wrap clue generation in async methods
  3. Caching: Cache generated clues to avoid regeneration
  4. Fallback Chain: LLM → Enhanced Templates → Basic Templates

Configuration Options

LLM Settings

  • model_name: Change model (default: "google/flan-t5-small")
  • max_length: Maximum clue length (default: 50)
  • temperature: Generation creativity (default: 0.7)
  • num_candidates: Clue candidates to generate (default: 3)

Performance Tuning

  • cache_dir: Model cache location
  • batch_size: For batch processing
  • device: CPU (-1) or GPU (0, 1, ...)

Troubleshooting

Common Issues

  1. "transformers not available"

    • Install: pip install transformers torch
  2. "Model download failed"

    • Check internet connection
    • Verify cache directory permissions
    • Try: huggingface_hub.snapshot_download('google/flan-t5-small')
  3. "Out of memory"

    • Reduce vocabulary size in thematic generator
    • Use smaller batch sizes
    • Consider model quantization
  4. Slow generation

    • First run downloads model (~250MB)
    • Subsequent runs use cached model
    • CPU inference is slower than GPU but more compatible

Production Considerations

For Hugging Face Spaces

  • ✅ Model size (~250MB) fits in HF Spaces
  • ✅ CPU-only inference supported
  • ✅ No external API dependencies
  • ⚠️ Startup time includes model download
  • ⚠️ Generation time may be noticeable in UI

Recommendations

  1. Preload models during app startup
  2. Cache clues aggressively to avoid regeneration
  3. Show loading indicators during clue generation
  4. Implement timeouts for clue generation (fallback to templates)
  5. Consider async processing for better UX

Alternative Models

If flan-t5-small doesn't meet requirements:

  • Smaller: distilgpt2 (~320MB, faster but lower quality)
  • Larger: google/flan-t5-base (~850MB, better quality but slower)
  • Specialized: microsoft/DialoGPT-small (~350MB, conversational style)

Next Steps

  1. Run tests to evaluate performance on your hardware
  2. Compare clue quality with existing template system
  3. Measure actual memory usage in HF Spaces environment
  4. Integrate with main crossword application if results are satisfactory