WAQO / README.md
Hermit11's picture
update README
20dad14 verified

A newer version of the Gradio SDK is available: 5.35.0

Upgrade
metadata
title: WAQO
emoji: 🐢
colorFrom: indigo
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Wakili! A quick one!

WAQO - Wakili, A Quick One

A legal assistant chatbot for the Kenya Finance Bill 2025 that provides easy-to-understand explanations of legal concepts and implications.

Features

  • Interactive chat interface for querying about the Finance Bill 2025
  • Multi-language support (English, Kiswahili, Luo)
  • RAG (Retrieval-Augmented Generation) system for accurate responses
  • Friendly, conversational tone with Kenyan context

Setup Instructions

Local Development

  1. Clone the repository:

    git clone https://huggingface.co/spaces/Wanxai/WAQO
    cd WAQO
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Create a .env file in the project root with your Google API key:

    GOOGLE_API_KEY=your_api_key_here
    
  5. Download the Finance Bill 2025 PDF:

    • Create a data directory in the project root
    • Place the Finance Bill 2025 PDF in the data directory
    • Name it finance-bill-2025.pdf
  6. Run the application:

    python app.py
    
  7. Access the web interface at http://localhost:7860

Deploying to Hugging Face Spaces

  1. Fork this repository to your Hugging Face account

  2. In the Hugging Face Space settings, add your Google API key as a secret:

    • Name: GOOGLE_API_KEY
    • Value: Your Google Generative AI API key
  3. Upload the Finance Bill 2025 PDF:

    • Go to the "Files" tab in your Space
    • Create a data directory
    • Upload the PDF file as finance-bill-2025.pdf
  4. The Space will automatically deploy with the correct environment

Project Structure

  • app.py: Main application with Gradio interface
  • main.py: FastAPI server entry point
  • app/services/: Core services for the chatbot
    • llm_service.py: Handles interaction with Google's Generative AI
    • vector_store.py: Manages the vector database for RAG
    • document_processor.py: Processes the PDF document
  • app/models/: Data models
  • app/core/: Configuration and utilities
  • data/: Directory for storing the Finance Bill PDF

License

This project is licensed under the MIT License - see the LICENSE file for details.

Finance Bill RAG System

A Retrieval-Augmented Generation (RAG) system that processes a locally stored Finance Bill PDF and allows users to query it using natural language. The system uses Google's Gemini 1.5 Flash LLM to generate clear, concise responses based on the document content.

Features

  • Automatic PDF processing on startup
  • Multiple PDF text extraction methods (PyPDF and PDFPlumber)
  • Intelligent text chunking for better context retrieval
  • Vector storage using ChromaDB for semantic search
  • Natural language querying using Gemini 1.5 Flash LLM
  • Markdown-formatted responses for readability

System Architecture

  • FastAPI Backend: High-performance API with a single query endpoint
  • ChromaDB: Vector database for storing and retrieving document chunks
  • Gemini 1.5 Flash: Advanced LLM for generating human-friendly responses
  • PDF Processing Pipeline: Robust extraction with multiple fallback methods

Setup

  1. Clone the repository
  2. Create a virtual environment (recommended):
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Create a .env file and add your Google API key:
    GOOGLE_API_KEY=your_google_api_key
    
  5. Place your Finance Bill PDF in the data directory as finance-bill-2025.pdf
  6. Run the application:
    python main.py
    

API Endpoint

  • POST /query: Query the Finance Bill document
    • Request body:
      {
        "query": "What changes are proposed for income tax?",
        "top_k": 4  // Optional, number of chunks to retrieve
      }
      
    • Response format:
      {
        "query": "The original question asked",
        "answer": "Markdown-formatted response generated by Gemini",
        "sources": [{
          "content": "The text chunk from the document",
          "metadata": {
            "document_id": "finance-bill-2025",
            "chunk_index": 0,
            "chunk_count": 1
          },
          "score": 0.7167216539382935  // Relevance score
        }]
      }
      

Example Usage

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What changes are proposed for income tax?"
  }'

Interactive Documentation

The system includes Swagger UI documentation at http://localhost:8000/docs where you can interactively test the API.