Mehardeep7 commited on
Commit
d29a257
·
1 Parent(s): f5fc0c6

Deploy RAG pipeline to Hugging Face Spaces

Browse files
Files changed (3) hide show
  1. README.md +75 -10
  2. app.py +401 -0
  3. requirements.txt +10 -0
README.md CHANGED
@@ -1,12 +1,77 @@
1
- ---
2
- title: Rag Pipeline Llm
3
- emoji:
4
- colorFrom: gray
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 5.44.1
8
- app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🔍 RAG Pipeline For LLMs 🚀
2
+
3
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Mehardeep79/rag-pipeline-llm)
4
+ [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://python.org)
5
+
6
+ ## 📖 Project Overview
7
+
8
+ An intelligent **Retrieval-Augmented Generation (RAG)** pipeline that combines semantic search with question-answering capabilities. This system fetches Wikipedia articles, processes them into searchable chunks, and uses state-of-the-art AI models to provide accurate, context-aware answers.
9
+
10
+ ## ✨ Key Features
11
+
12
+ - 📚 **Dynamic Knowledge Retrieval** from Wikipedia with error handling
13
+ - 🧮 **Semantic Search** using sentence transformers (no keyword dependency)
14
+ - ⚡ **Fast Vector Similarity** with FAISS indexing (sub-second search)
15
+ - 🤖 **Intelligent Answer Generation** using pre-trained QA models
16
+ - 📊 **Confidence Scoring** for answer quality assessment
17
+ - 🎛️ **Customizable Parameters** (chunk size, retrieval count, overlap)
18
+ - ✂️ **Smart Text Chunking** with overlapping segments for context preservation
19
+
20
+ ## 🏗️ Architecture
21
+
22
+ ```
23
+ User Query → Embedding → FAISS Search → Retrieve Chunks → QA Model → Answer + Confidence
24
+ ```
25
+
26
+ ## 🤖 AI Models Used
27
+
28
+ - **📏 Text Chunking**: `sentence-transformers/all-mpnet-base-v2` tokenizer
29
+ - **🧮 Vector Embeddings**: `sentence-transformers/all-mpnet-base-v2` (768-dimensional)
30
+ - **❓ Question Answering**: `deepset/roberta-base-squad2` (RoBERTa fine-tuned on SQuAD 2.0)
31
+ - **🔍 Vector Search**: FAISS IndexFlatL2 for L2 distance similarity
32
+
33
+ ## 🚀 How to Use
34
+
35
+ 1. **📖 Process Article**: Enter any Wikipedia topic and configure chunk settings
36
+ 2. **❓ Ask Questions**: Switch to Q&A tab and enter your questions
37
+ 3. **📊 View Results**: Explore answers with confidence scores and similarity metrics
38
+ 4. **🔍 Analyze**: Check retrieved context and visualization analytics
39
+
40
+ ## 💡 Example Usage
41
+
42
+ ```
43
+ Topic: "Artificial Intelligence"
44
+ Question: "What is machine learning?"
45
+ Answer: "Machine learning is a subset of artificial intelligence..."
46
+ Confidence: 89.7%
47
+ ```
48
+
49
+ ## 🔧 Configuration Options
50
+
51
+ - **Chunk Size**: 128-512 tokens (default: 256)
52
+ - **Overlap**: 10-50 tokens (default: 20)
53
+ - **Retrieval Count**: 1-10 chunks (default: 3)
54
+
55
+ ## 📊 Performance
56
+
57
+ - **Search Speed**: Sub-second retrieval for 1000+ chunks
58
+ - **Accuracy**: High precision with confidence scoring
59
+ - **Memory Efficient**: Optimized chunk sizes prevent token overflow
60
+
61
+ ## 🔗 Links
62
+
63
+ - **📝 Full Project**: [GitHub Repository](https://github.com/Mehardeep79/RAG_Pipeline_LLM)
64
+ - **📓 Jupyter Notebook**: Complete implementation with explanations
65
+ - **🌐 Streamlit App**: Alternative web interface
66
+
67
+ ## 🤝 Credits
68
+
69
+ Built with ❤️ using:
70
+ - 🤗 **Hugging Face** for transformers and model hosting
71
+ - ⚡ **FAISS** for efficient vector search
72
+ - 🎨 **Gradio** for the interactive interface
73
+ - 📖 **Wikipedia API** for knowledge content
74
+
75
  ---
76
 
77
+ **⭐ If you find this useful, please give it a star on GitHub!**
app.py ADDED
@@ -0,0 +1,401 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import numpy as np
3
+ import wikipedia
4
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
5
+ from sentence_transformers import SentenceTransformer
6
+ import faiss
7
+ import plotly.graph_objects as go
8
+ import plotly.express as px
9
+ from plotly.subplots import make_subplots
10
+ import time
11
+ import pandas as pd
12
+ import warnings
13
+ warnings.filterwarnings("ignore")
14
+
15
+ # Global variables to store models and data
16
+ embedding_model = None
17
+ qa_pipeline = None
18
+ chunks = None
19
+ embeddings = None
20
+ index = None
21
+ document = None
22
+
23
+ def load_models():
24
+ """Load and cache the ML models"""
25
+ global embedding_model, qa_pipeline
26
+
27
+ if embedding_model is None:
28
+ print("🤖 Loading embedding model...")
29
+ embedding_model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
30
+
31
+ print("🤖 Loading QA model...")
32
+ qa_tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
33
+ qa_model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
34
+ qa_pipeline = pipeline("question-answering", model=qa_model, tokenizer=qa_tokenizer)
35
+
36
+ print("✅ Models loaded successfully!")
37
+
38
+ return "✅ Models are ready!"
39
+
40
+ def get_wikipedia_content(topic):
41
+ """Fetch Wikipedia content"""
42
+ try:
43
+ page = wikipedia.page(topic)
44
+ return page.content, f"✅ Successfully fetched '{topic}' article"
45
+ except wikipedia.exceptions.PageError:
46
+ return None, f"❌ Page '{topic}' not found. Please try a different topic."
47
+ except wikipedia.exceptions.DisambiguationError as e:
48
+ return None, f"⚠️ Ambiguous topic. Try one of these: {', '.join(e.options[:5])}"
49
+
50
+ def split_text(text, chunk_size=256, chunk_overlap=20):
51
+ """Split text into overlapping chunks"""
52
+ tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-mpnet-base-v2")
53
+
54
+ # Split into sentences first
55
+ sentences = text.split('. ')
56
+ chunks = []
57
+ current_chunk = ""
58
+
59
+ for sentence in sentences:
60
+ test_chunk = current_chunk + ". " + sentence if current_chunk else sentence
61
+ test_tokens = tokenizer.tokenize(test_chunk)
62
+
63
+ if len(test_tokens) > chunk_size:
64
+ if current_chunk:
65
+ chunks.append(current_chunk.strip())
66
+
67
+ # Add overlap
68
+ if chunk_overlap > 0 and chunks:
69
+ overlap_tokens = tokenizer.tokenize(current_chunk)
70
+ if len(overlap_tokens) > chunk_overlap:
71
+ overlap_start = len(overlap_tokens) - chunk_overlap
72
+ overlap_text = tokenizer.convert_tokens_to_string(overlap_tokens[overlap_start:])
73
+ current_chunk = overlap_text + ". " + sentence
74
+ else:
75
+ current_chunk = sentence
76
+ else:
77
+ current_chunk = sentence
78
+ else:
79
+ current_chunk = sentence
80
+ else:
81
+ current_chunk = test_chunk
82
+
83
+ if current_chunk.strip():
84
+ chunks.append(current_chunk.strip())
85
+
86
+ return chunks
87
+
88
+ def process_article(topic, chunk_size, chunk_overlap):
89
+ """Process Wikipedia article into chunks and embeddings"""
90
+ global chunks, embeddings, index, document
91
+
92
+ if not topic.strip():
93
+ return "⚠️ Please enter a topic first!", None, ""
94
+
95
+ # Load models first
96
+ load_models()
97
+
98
+ # Fetch content
99
+ document, message = get_wikipedia_content(topic)
100
+
101
+ if document is None:
102
+ return message, None, ""
103
+
104
+ # Process text
105
+ chunks = split_text(document, int(chunk_size), int(chunk_overlap))
106
+
107
+ # Create embeddings
108
+ embeddings = embedding_model.encode(chunks)
109
+
110
+ # Build FAISS index
111
+ dimension = embeddings.shape[1]
112
+ index = faiss.IndexFlatL2(dimension)
113
+ index.add(np.array(embeddings))
114
+
115
+ # Create summary stats
116
+ chunk_lengths = [len(chunk.split()) for chunk in chunks]
117
+ summary = f"""
118
+ 📊 **Processing Summary:**
119
+ - **Total chunks**: {len(chunks)}
120
+ - **Embedding dimension**: {dimension}
121
+ - **Average chunk length**: {np.mean(chunk_lengths):.1f} words
122
+ - **Min/Max chunk length**: {min(chunk_lengths)}/{max(chunk_lengths)} words
123
+ - **Document length**: {len(document.split())} words
124
+
125
+ ✅ Ready for questions!
126
+ """
127
+
128
+ return f"✅ Successfully processed '{topic}' into {len(chunks)} chunks!", create_chunk_visualization(), summary
129
+
130
+ def create_chunk_visualization():
131
+ """Create chunk length distribution plot"""
132
+ if chunks is None:
133
+ return None
134
+
135
+ chunk_lengths = [len(chunk.split()) for chunk in chunks]
136
+
137
+ fig = make_subplots(
138
+ rows=1, cols=2,
139
+ subplot_titles=("📏 Chunk Length Distribution", "📊 Statistical Summary"),
140
+ specs=[[{"type": "bar"}, {"type": "box"}]]
141
+ )
142
+
143
+ # Histogram
144
+ fig.add_trace(
145
+ go.Histogram(x=chunk_lengths, nbinsx=15, name="Distribution",
146
+ marker_color="skyblue", opacity=0.7),
147
+ row=1, col=1
148
+ )
149
+
150
+ # Box plot
151
+ fig.add_trace(
152
+ go.Box(y=chunk_lengths, name="Statistics",
153
+ marker_color="lightgreen", boxmean=True),
154
+ row=1, col=2
155
+ )
156
+
157
+ fig.update_layout(height=400, showlegend=False, title="📊 Chunk Analysis")
158
+
159
+ return fig
160
+
161
+ def answer_question(question, k_retrieval):
162
+ """Answer question using RAG pipeline"""
163
+ global chunks, embeddings, index, qa_pipeline
164
+
165
+ if chunks is None or index is None:
166
+ return "⚠️ Please process an article first!", None, "", ""
167
+
168
+ if not question.strip():
169
+ return "⚠️ Please enter a question!", None, "", ""
170
+
171
+ # Get query embedding
172
+ query_embedding = embedding_model.encode([question])
173
+
174
+ # Search
175
+ distances, indices = index.search(np.array(query_embedding), int(k_retrieval))
176
+ retrieved_chunks = [chunks[i] for i in indices[0]]
177
+
178
+ # Generate answer
179
+ context = " ".join(retrieved_chunks)
180
+ answer = qa_pipeline(question=question, context=context)
181
+
182
+ # Format results
183
+ confidence = answer['score']
184
+
185
+ # Determine confidence level
186
+ if confidence >= 0.8:
187
+ confidence_emoji = "🟢"
188
+ confidence_text = "Very High"
189
+ elif confidence >= 0.6:
190
+ confidence_emoji = "🔵"
191
+ confidence_text = "High"
192
+ elif confidence >= 0.4:
193
+ confidence_emoji = "🟡"
194
+ confidence_text = "Medium"
195
+ else:
196
+ confidence_emoji = "🔴"
197
+ confidence_text = "Low"
198
+
199
+ # Format answer
200
+ formatted_answer = f"""
201
+ 🤖 **Answer**: {answer['answer']}
202
+
203
+ {confidence_emoji} **Confidence**: {confidence:.1%} ({confidence_text})
204
+ 📏 **Answer Length**: {len(answer['answer'])} characters
205
+ 🔍 **Chunks Used**: {len(retrieved_chunks)}
206
+ """
207
+
208
+ # Format retrieved chunks
209
+ retrieved_text = "📋 **Retrieved Context Chunks:**\n\n"
210
+ for i, chunk in enumerate(retrieved_chunks):
211
+ similarity = 1 / (1 + distances[0][i])
212
+ retrieved_text += f"**Chunk {i+1}** (Similarity: {similarity:.3f}):\n{chunk}\n\n---\n\n"
213
+
214
+ # Create similarity visualization
215
+ similarity_scores = 1 / (1 + distances[0])
216
+ similarity_plot = create_similarity_plot(similarity_scores)
217
+
218
+ return formatted_answer, similarity_plot, retrieved_text, create_confidence_gauge(confidence)
219
+
220
+ def create_similarity_plot(similarity_scores):
221
+ """Create similarity scores bar chart"""
222
+ fig = go.Figure(data=[
223
+ go.Bar(x=[f"Rank {i+1}" for i in range(len(similarity_scores))],
224
+ y=similarity_scores,
225
+ marker_color=['gold', 'silver', '#CD7F32'][:len(similarity_scores)],
226
+ text=[f'{score:.3f}' for score in similarity_scores],
227
+ textposition='auto')
228
+ ])
229
+
230
+ fig.update_layout(
231
+ title="🎯 Retrieved Chunks Similarity Scores",
232
+ xaxis_title="Retrieved Chunk Rank",
233
+ yaxis_title="Similarity Score",
234
+ height=400
235
+ )
236
+
237
+ return fig
238
+
239
+ def create_confidence_gauge(confidence):
240
+ """Create confidence gauge visualization"""
241
+ fig = go.Figure(go.Indicator(
242
+ mode = "gauge+number+delta",
243
+ value = confidence * 100,
244
+ domain = {'x': [0, 1], 'y': [0, 1]},
245
+ title = {'text': "🎯 Answer Confidence (%)"},
246
+ delta = {'reference': 80},
247
+ gauge = {
248
+ 'axis': {'range': [None, 100]},
249
+ 'bar': {'color': "darkblue"},
250
+ 'steps': [
251
+ {'range': [0, 20], 'color': "red"},
252
+ {'range': [20, 40], 'color': "orange"},
253
+ {'range': [40, 60], 'color': "yellow"},
254
+ {'range': [60, 80], 'color': "lightgreen"},
255
+ {'range': [80, 100], 'color': "green"}
256
+ ],
257
+ 'threshold': {
258
+ 'line': {'color': "black", 'width': 4},
259
+ 'thickness': 0.75,
260
+ 'value': 90
261
+ }
262
+ }
263
+ ))
264
+
265
+ fig.update_layout(height=400)
266
+ return fig
267
+
268
+ def clear_data():
269
+ """Clear all processed data"""
270
+ global chunks, embeddings, index, document
271
+ chunks = None
272
+ embeddings = None
273
+ index = None
274
+ document = None
275
+ return "🗑️ Data cleared! Ready for new article.", None, "", "", None, None, ""
276
+
277
+ # Create Gradio interface optimized for Hugging Face Spaces
278
+ def create_interface():
279
+ """Create the main Gradio interface"""
280
+
281
+ with gr.Blocks(
282
+ title="🔍 RAG Pipeline For LLMs",
283
+ theme=gr.themes.Soft(),
284
+ ) as interface:
285
+
286
+ # Header
287
+ gr.Markdown("""
288
+ # 🔍 RAG Pipeline For LLMs 🚀
289
+
290
+ <div style="text-align: center; color: #666; margin-bottom: 2rem;">
291
+ An intelligent Q&A system powered by 🤗 Hugging Face, 📖 Wikipedia, and ⚡ FAISS vector search
292
+ </div>
293
+ """)
294
+
295
+ with gr.Tab("📖 Article Processing"):
296
+ with gr.Row():
297
+ with gr.Column(scale=2):
298
+ gr.Markdown("### 📋 Step 1: Configure & Process Article")
299
+
300
+ topic_input = gr.Textbox(
301
+ label="📖 Wikipedia Topic",
302
+ placeholder="e.g., Artificial Intelligence, Climate Change, Python Programming",
303
+ info="Enter any topic available on Wikipedia"
304
+ )
305
+
306
+ with gr.Row():
307
+ chunk_size = gr.Slider(
308
+ label="📏 Chunk Size (tokens)",
309
+ minimum=128,
310
+ maximum=512,
311
+ value=256,
312
+ step=32,
313
+ info="Larger chunks = more context, smaller chunks = more precision"
314
+ )
315
+
316
+ chunk_overlap = gr.Slider(
317
+ label="🔗 Chunk Overlap (tokens)",
318
+ minimum=10,
319
+ maximum=50,
320
+ value=20,
321
+ step=5,
322
+ info="Overlap helps maintain context between chunks"
323
+ )
324
+
325
+ process_btn = gr.Button("🔄 Fetch & Process Article", variant="primary", size="lg")
326
+
327
+ processing_status = gr.Textbox(
328
+ label="📊 Processing Status",
329
+ interactive=False
330
+ )
331
+
332
+ with gr.Column(scale=1):
333
+ processing_summary = gr.Markdown("### 📈 Processing Summary\n*Process an article to see statistics*")
334
+
335
+ chunk_plot = gr.Plot(label="📊 Chunk Analysis Visualization")
336
+
337
+ with gr.Tab("❓ Question Answering"):
338
+ with gr.Row():
339
+ with gr.Column(scale=2):
340
+ gr.Markdown("### 🎯 Step 2: Ask Your Question")
341
+
342
+ question_input = gr.Textbox(
343
+ label="❓ Your Question",
344
+ placeholder="e.g., What is the main concept? How does it work?",
345
+ info="Ask any question about the processed article"
346
+ )
347
+
348
+ k_retrieval = gr.Slider(
349
+ label="🔍 Number of Chunks to Retrieve",
350
+ minimum=1,
351
+ maximum=10,
352
+ value=3,
353
+ step=1,
354
+ info="More chunks = broader context, fewer chunks = more focused"
355
+ )
356
+
357
+ answer_btn = gr.Button("🎯 Get Answer", variant="primary", size="lg")
358
+
359
+ with gr.Column(scale=1):
360
+ gr.Markdown("### 💡 Tips\n- Process an article first\n- Ask specific questions\n- Adjust retrieval count for better results")
361
+
362
+ answer_output = gr.Markdown(label="🤖 Generated Answer")
363
+
364
+ with gr.Row():
365
+ similarity_plot = gr.Plot(label="🎯 Similarity Scores")
366
+ confidence_gauge = gr.Plot(label="📊 Confidence Meter")
367
+
368
+ with gr.Tab("📋 Retrieved Context"):
369
+ retrieved_chunks = gr.Markdown(
370
+ label="📄 Retrieved Chunks",
371
+ value="*Ask a question to see retrieved context chunks*"
372
+ )
373
+
374
+ # Event handlers
375
+ process_btn.click(
376
+ fn=process_article,
377
+ inputs=[topic_input, chunk_size, chunk_overlap],
378
+ outputs=[processing_status, chunk_plot, processing_summary]
379
+ )
380
+
381
+ answer_btn.click(
382
+ fn=answer_question,
383
+ inputs=[question_input, k_retrieval],
384
+ outputs=[answer_output, similarity_plot, retrieved_chunks, confidence_gauge]
385
+ )
386
+
387
+ # Footer
388
+ gr.Markdown("""
389
+ ---
390
+ <div style="text-align: center; color: #666; padding: 1rem;">
391
+ 🔍 RAG Pipeline Demo | Built with ❤️ using Gradio, Hugging Face, and FAISS<br>
392
+ 🤗 Models: sentence-transformers/all-mpnet-base-v2 | deepset/roberta-base-squad2
393
+ </div>
394
+ """)
395
+
396
+ return interface
397
+
398
+ # Launch the app for Hugging Face Spaces
399
+ if __name__ == "__main__":
400
+ interface = create_interface()
401
+ interface.launch()
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ transformers>=4.21.0
2
+ sentence-transformers>=2.2.0
3
+ torch>=1.11.0
4
+ faiss-cpu>=1.7.0
5
+ wikipedia>=1.4.0
6
+ gradio>=4.0.0
7
+ plotly>=5.0.0
8
+ numpy>=1.21.0
9
+ scipy>=1.7.0
10
+ pandas>=1.3.0