Slash / README.md
ND06-25's picture
Switch HF Space to Streamlit SDK
3669696

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: Slash
emoji: πŸŒ–
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.25.0
pinned: false
license: mit
short_description: 'An AI powered book summarizer '

πŸ“š Book Summarizer AI

An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models.

✨ Features

  • πŸ“š PDF Text Extraction: Advanced PDF processing with multiple extraction methods
  • πŸ€– AI-Powered Summarization: Uses transformer models (BART, T5) for high-quality summaries
  • 🌐 Beautiful Web Interface: Modern UI built with Streamlit
  • ⚑ FastAPI Backend: Scalable and fast API for processing
  • πŸ“ Configurable Settings: Adjust summary length, chunk size, and AI models
  • πŸ“Š Text Analysis: Detailed statistics about book content
  • πŸ’Ύ Download Summaries: Save summaries as text files

πŸš€ Quick Start

Option 1: Automated Setup (Recommended)

Windows:

# Double-click start.bat or run:
start.bat

Unix/Linux/Mac:

# Make script executable and run:
chmod +x start.sh
./start.sh

Option 2: Manual Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Download NLTK data:
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
  1. Start the FastAPI backend:
uvicorn api.main:app --reload --port 8000
  1. Start the Streamlit frontend:
streamlit run app.py
  1. Open your browser:

πŸ“– Usage

  1. Upload PDF: Select a PDF book file (max 50MB)
  2. Configure Settings: Choose AI model and summary parameters
  3. Generate Summary: Click "Generate Summary" and wait for processing
  4. Download Result: Save your AI-generated summary

πŸ› οΈ Technology Stack

Frontend

  • Streamlit: Modern web interface
  • Custom CSS: Beautiful styling and responsive design

Backend

  • FastAPI: High-performance API framework
  • Uvicorn: ASGI server for FastAPI

AI & ML

  • Hugging Face Transformers: State-of-the-art NLP models
  • PyTorch: Deep learning framework
  • BART/T5 Models: Pre-trained summarization models

PDF Processing

  • PyPDF2: PDF text extraction
  • pdfplumber: Advanced PDF processing
  • NLTK: Natural language processing

πŸ“ Project Structure

book-summarizer/
β”œβ”€β”€ app.py                 # Streamlit frontend
β”œβ”€β”€ start.py              # Automated startup script
β”œβ”€β”€ start.bat             # Windows startup script
β”œβ”€β”€ start.sh              # Unix/Linux/Mac startup script
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ __init__.py       # API package
β”‚   β”œβ”€β”€ main.py           # FastAPI backend
β”‚   β”œβ”€β”€ pdf_processor.py  # PDF text extraction
β”‚   β”œβ”€β”€ summarizer.py     # AI summarization logic
β”‚   └── utils.py          # Utility functions
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md            # Project documentation

βš™οΈ Configuration

AI Models

  • facebook/bart-large-cnn: Best quality, slower processing
  • t5-small: Faster processing, good quality
  • facebook/bart-base: Balanced performance

Summary Settings

  • Max Length: 50-500 words (default: 150)
  • Min Length: 10-200 words (default: 50)
  • Chunk Size: 500-2000 characters (default: 1000)
  • Overlap: 50-200 characters (default: 100)

πŸ”§ API Endpoints

  • GET / - API information
  • GET /health - Health check
  • POST /upload-pdf - Validate PDF file
  • POST /extract-text - Extract text from PDF
  • POST /summarize - Generate book summary
  • GET /models - List available AI models
  • POST /change-model - Switch AI model

πŸ“‹ Requirements

  • Python: 3.8 or higher
  • Memory: At least 4GB RAM (8GB recommended)
  • Storage: 2GB free space for models
  • Internet: Required for first-time model download

πŸ› Troubleshooting

Common Issues

  1. "Module not found" errors:

    pip install -r requirements.txt
    
  2. NLTK data missing:

    python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
    
  3. API connection failed:

    • Ensure FastAPI is running on port 8000
    • Check firewall settings
    • Verify no other service is using the port
  4. Large PDF processing slow:

    • Reduce chunk size in advanced settings
    • Use a faster model (t5-small)
    • Ensure sufficient RAM
  5. Model download issues:

    • Check internet connection
    • Clear Hugging Face cache: rm -rf ~/.cache/huggingface

Performance Tips

  • GPU Acceleration: Install CUDA for faster processing
  • Model Selection: Use smaller models for faster results
  • Chunk Size: Smaller chunks = faster processing but may lose context
  • Memory: Close other applications to free up RAM

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is open source and available under the MIT License.

πŸ™ Acknowledgments

  • Hugging Face for transformer models
  • Streamlit for the web framework
  • FastAPI for the backend framework
  • The open-source community for various libraries

πŸ“ž Support

For issues, questions, or feature requests:

  1. Check the troubleshooting section
  2. Open an issue on GitHub

Happy summarizing! πŸ“šβœ¨

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference