Hermit11 commited on
Commit
20dad14
·
verified ·
1 Parent(s): 8461c80

update README

Browse files
Files changed (1) hide show
  1. README.md +163 -2
README.md CHANGED
@@ -7,8 +7,169 @@ sdk: gradio
7
  sdk_version: 5.29.0
8
  app_file: app.py
9
  pinned: false
10
- license: unknown
11
  short_description: Wakili! A quick one!
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  sdk_version: 5.29.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  short_description: Wakili! A quick one!
12
  ---
13
 
14
+ # WAQO - Wakili, A Quick One
15
+
16
+ A legal assistant chatbot for the Kenya Finance Bill 2025 that provides easy-to-understand explanations of legal concepts and implications.
17
+
18
+ ## Features
19
+
20
+ - Interactive chat interface for querying about the Finance Bill 2025
21
+ - Multi-language support (English, Kiswahili, Luo)
22
+ - RAG (Retrieval-Augmented Generation) system for accurate responses
23
+ - Friendly, conversational tone with Kenyan context
24
+
25
+ ## Setup Instructions
26
+
27
+ ### Local Development
28
+
29
+ 1. Clone the repository:
30
+ ```bash
31
+ git clone https://huggingface.co/spaces/Wanxai/WAQO
32
+ cd WAQO
33
+ ```
34
+
35
+ 2. Create a virtual environment:
36
+ ```bash
37
+ python -m venv venv
38
+ source venv/bin/activate # On Windows: venv\Scripts\activate
39
+ ```
40
+
41
+ 3. Install dependencies:
42
+ ```bash
43
+ pip install -r requirements.txt
44
+ ```
45
+
46
+ 4. Create a `.env` file in the project root with your Google API key:
47
+ ```
48
+ GOOGLE_API_KEY=your_api_key_here
49
+ ```
50
+
51
+ 5. Download the Finance Bill 2025 PDF:
52
+ - Create a `data` directory in the project root
53
+ - Place the Finance Bill 2025 PDF in the `data` directory
54
+ - Name it `finance-bill-2025.pdf`
55
+
56
+ 6. Run the application:
57
+ ```bash
58
+ python app.py
59
+ ```
60
+
61
+ 7. Access the web interface at http://localhost:7860
62
+
63
+ ### Deploying to Hugging Face Spaces
64
+
65
+ 1. Fork this repository to your Hugging Face account
66
+
67
+ 2. In the Hugging Face Space settings, add your Google API key as a secret:
68
+ - Name: `GOOGLE_API_KEY`
69
+ - Value: Your Google Generative AI API key
70
+
71
+ 3. Upload the Finance Bill 2025 PDF:
72
+ - Go to the "Files" tab in your Space
73
+ - Create a `data` directory
74
+ - Upload the PDF file as `finance-bill-2025.pdf`
75
+
76
+ 4. The Space will automatically deploy with the correct environment
77
+
78
+ ## Project Structure
79
+
80
+ - `app.py`: Main application with Gradio interface
81
+ - `main.py`: FastAPI server entry point
82
+ - `app/services/`: Core services for the chatbot
83
+ - `llm_service.py`: Handles interaction with Google's Generative AI
84
+ - `vector_store.py`: Manages the vector database for RAG
85
+ - `document_processor.py`: Processes the PDF document
86
+ - `app/models/`: Data models
87
+ - `app/core/`: Configuration and utilities
88
+ - `data/`: Directory for storing the Finance Bill PDF
89
+
90
+ ## License
91
+
92
+ This project is licensed under the MIT License - see the LICENSE file for details.
93
+
94
+ # Finance Bill RAG System
95
+
96
+ A Retrieval-Augmented Generation (RAG) system that processes a locally stored Finance Bill PDF and allows users to query it using natural language. The system uses Google's Gemini 1.5 Flash LLM to generate clear, concise responses based on the document content.
97
+
98
+ ## Features
99
+
100
+ - Automatic PDF processing on startup
101
+ - Multiple PDF text extraction methods (PyPDF and PDFPlumber)
102
+ - Intelligent text chunking for better context retrieval
103
+ - Vector storage using ChromaDB for semantic search
104
+ - Natural language querying using Gemini 1.5 Flash LLM
105
+ - Markdown-formatted responses for readability
106
+
107
+ ## System Architecture
108
+
109
+ - **FastAPI Backend**: High-performance API with a single query endpoint
110
+ - **ChromaDB**: Vector database for storing and retrieving document chunks
111
+ - **Gemini 1.5 Flash**: Advanced LLM for generating human-friendly responses
112
+ - **PDF Processing Pipeline**: Robust extraction with multiple fallback methods
113
+
114
+ ## Setup
115
+
116
+ 1. Clone the repository
117
+ 2. Create a virtual environment (recommended):
118
+ ```bash
119
+ python -m venv venv
120
+ source venv/bin/activate # On Windows: venv\Scripts\activate
121
+ ```
122
+ 3. Install dependencies:
123
+ ```bash
124
+ pip install -r requirements.txt
125
+ ```
126
+ 4. Create a `.env` file and add your Google API key:
127
+ ```
128
+ GOOGLE_API_KEY=your_google_api_key
129
+ ```
130
+ 5. Place your Finance Bill PDF in the `data` directory as `finance-bill-2025.pdf`
131
+ 6. Run the application:
132
+ ```bash
133
+ python main.py
134
+ ```
135
+
136
+ ## API Endpoint
137
+
138
+ - `POST /query`: Query the Finance Bill document
139
+ - Request body:
140
+ ```json
141
+ {
142
+ "query": "What changes are proposed for income tax?",
143
+ "top_k": 4 // Optional, number of chunks to retrieve
144
+ }
145
+ ```
146
+ - Response format:
147
+ ```json
148
+ {
149
+ "query": "The original question asked",
150
+ "answer": "Markdown-formatted response generated by Gemini",
151
+ "sources": [{
152
+ "content": "The text chunk from the document",
153
+ "metadata": {
154
+ "document_id": "finance-bill-2025",
155
+ "chunk_index": 0,
156
+ "chunk_count": 1
157
+ },
158
+ "score": 0.7167216539382935 // Relevance score
159
+ }]
160
+ }
161
+ ```
162
+
163
+ ## Example Usage
164
+
165
+ ```bash
166
+ curl -X POST "http://localhost:8000/query" \
167
+ -H "Content-Type: application/json" \
168
+ -d '{
169
+ "query": "What changes are proposed for income tax?"
170
+ }'
171
+ ```
172
+
173
+ ## Interactive Documentation
174
+
175
+ The system includes Swagger UI documentation at `http://localhost:8000/docs` where you can interactively test the API.