update README
Browse files
README.md
CHANGED
@@ -7,8 +7,169 @@ sdk: gradio
|
|
7 |
sdk_version: 5.29.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
-
license:
|
11 |
short_description: Wakili! A quick one!
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
sdk_version: 5.29.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: apache-2.0
|
11 |
short_description: Wakili! A quick one!
|
12 |
---
|
13 |
|
14 |
+
# WAQO - Wakili, A Quick One
|
15 |
+
|
16 |
+
A legal assistant chatbot for the Kenya Finance Bill 2025 that provides easy-to-understand explanations of legal concepts and implications.
|
17 |
+
|
18 |
+
## Features
|
19 |
+
|
20 |
+
- Interactive chat interface for querying about the Finance Bill 2025
|
21 |
+
- Multi-language support (English, Kiswahili, Luo)
|
22 |
+
- RAG (Retrieval-Augmented Generation) system for accurate responses
|
23 |
+
- Friendly, conversational tone with Kenyan context
|
24 |
+
|
25 |
+
## Setup Instructions
|
26 |
+
|
27 |
+
### Local Development
|
28 |
+
|
29 |
+
1. Clone the repository:
|
30 |
+
```bash
|
31 |
+
git clone https://huggingface.co/spaces/Wanxai/WAQO
|
32 |
+
cd WAQO
|
33 |
+
```
|
34 |
+
|
35 |
+
2. Create a virtual environment:
|
36 |
+
```bash
|
37 |
+
python -m venv venv
|
38 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
39 |
+
```
|
40 |
+
|
41 |
+
3. Install dependencies:
|
42 |
+
```bash
|
43 |
+
pip install -r requirements.txt
|
44 |
+
```
|
45 |
+
|
46 |
+
4. Create a `.env` file in the project root with your Google API key:
|
47 |
+
```
|
48 |
+
GOOGLE_API_KEY=your_api_key_here
|
49 |
+
```
|
50 |
+
|
51 |
+
5. Download the Finance Bill 2025 PDF:
|
52 |
+
- Create a `data` directory in the project root
|
53 |
+
- Place the Finance Bill 2025 PDF in the `data` directory
|
54 |
+
- Name it `finance-bill-2025.pdf`
|
55 |
+
|
56 |
+
6. Run the application:
|
57 |
+
```bash
|
58 |
+
python app.py
|
59 |
+
```
|
60 |
+
|
61 |
+
7. Access the web interface at http://localhost:7860
|
62 |
+
|
63 |
+
### Deploying to Hugging Face Spaces
|
64 |
+
|
65 |
+
1. Fork this repository to your Hugging Face account
|
66 |
+
|
67 |
+
2. In the Hugging Face Space settings, add your Google API key as a secret:
|
68 |
+
- Name: `GOOGLE_API_KEY`
|
69 |
+
- Value: Your Google Generative AI API key
|
70 |
+
|
71 |
+
3. Upload the Finance Bill 2025 PDF:
|
72 |
+
- Go to the "Files" tab in your Space
|
73 |
+
- Create a `data` directory
|
74 |
+
- Upload the PDF file as `finance-bill-2025.pdf`
|
75 |
+
|
76 |
+
4. The Space will automatically deploy with the correct environment
|
77 |
+
|
78 |
+
## Project Structure
|
79 |
+
|
80 |
+
- `app.py`: Main application with Gradio interface
|
81 |
+
- `main.py`: FastAPI server entry point
|
82 |
+
- `app/services/`: Core services for the chatbot
|
83 |
+
- `llm_service.py`: Handles interaction with Google's Generative AI
|
84 |
+
- `vector_store.py`: Manages the vector database for RAG
|
85 |
+
- `document_processor.py`: Processes the PDF document
|
86 |
+
- `app/models/`: Data models
|
87 |
+
- `app/core/`: Configuration and utilities
|
88 |
+
- `data/`: Directory for storing the Finance Bill PDF
|
89 |
+
|
90 |
+
## License
|
91 |
+
|
92 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
93 |
+
|
94 |
+
# Finance Bill RAG System
|
95 |
+
|
96 |
+
A Retrieval-Augmented Generation (RAG) system that processes a locally stored Finance Bill PDF and allows users to query it using natural language. The system uses Google's Gemini 1.5 Flash LLM to generate clear, concise responses based on the document content.
|
97 |
+
|
98 |
+
## Features
|
99 |
+
|
100 |
+
- Automatic PDF processing on startup
|
101 |
+
- Multiple PDF text extraction methods (PyPDF and PDFPlumber)
|
102 |
+
- Intelligent text chunking for better context retrieval
|
103 |
+
- Vector storage using ChromaDB for semantic search
|
104 |
+
- Natural language querying using Gemini 1.5 Flash LLM
|
105 |
+
- Markdown-formatted responses for readability
|
106 |
+
|
107 |
+
## System Architecture
|
108 |
+
|
109 |
+
- **FastAPI Backend**: High-performance API with a single query endpoint
|
110 |
+
- **ChromaDB**: Vector database for storing and retrieving document chunks
|
111 |
+
- **Gemini 1.5 Flash**: Advanced LLM for generating human-friendly responses
|
112 |
+
- **PDF Processing Pipeline**: Robust extraction with multiple fallback methods
|
113 |
+
|
114 |
+
## Setup
|
115 |
+
|
116 |
+
1. Clone the repository
|
117 |
+
2. Create a virtual environment (recommended):
|
118 |
+
```bash
|
119 |
+
python -m venv venv
|
120 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
121 |
+
```
|
122 |
+
3. Install dependencies:
|
123 |
+
```bash
|
124 |
+
pip install -r requirements.txt
|
125 |
+
```
|
126 |
+
4. Create a `.env` file and add your Google API key:
|
127 |
+
```
|
128 |
+
GOOGLE_API_KEY=your_google_api_key
|
129 |
+
```
|
130 |
+
5. Place your Finance Bill PDF in the `data` directory as `finance-bill-2025.pdf`
|
131 |
+
6. Run the application:
|
132 |
+
```bash
|
133 |
+
python main.py
|
134 |
+
```
|
135 |
+
|
136 |
+
## API Endpoint
|
137 |
+
|
138 |
+
- `POST /query`: Query the Finance Bill document
|
139 |
+
- Request body:
|
140 |
+
```json
|
141 |
+
{
|
142 |
+
"query": "What changes are proposed for income tax?",
|
143 |
+
"top_k": 4 // Optional, number of chunks to retrieve
|
144 |
+
}
|
145 |
+
```
|
146 |
+
- Response format:
|
147 |
+
```json
|
148 |
+
{
|
149 |
+
"query": "The original question asked",
|
150 |
+
"answer": "Markdown-formatted response generated by Gemini",
|
151 |
+
"sources": [{
|
152 |
+
"content": "The text chunk from the document",
|
153 |
+
"metadata": {
|
154 |
+
"document_id": "finance-bill-2025",
|
155 |
+
"chunk_index": 0,
|
156 |
+
"chunk_count": 1
|
157 |
+
},
|
158 |
+
"score": 0.7167216539382935 // Relevance score
|
159 |
+
}]
|
160 |
+
}
|
161 |
+
```
|
162 |
+
|
163 |
+
## Example Usage
|
164 |
+
|
165 |
+
```bash
|
166 |
+
curl -X POST "http://localhost:8000/query" \
|
167 |
+
-H "Content-Type: application/json" \
|
168 |
+
-d '{
|
169 |
+
"query": "What changes are proposed for income tax?"
|
170 |
+
}'
|
171 |
+
```
|
172 |
+
|
173 |
+
## Interactive Documentation
|
174 |
+
|
175 |
+
The system includes Swagger UI documentation at `http://localhost:8000/docs` where you can interactively test the API.
|