Spaces:

umar-100
/

pdf-QA-bot

Sleeping

pdf-QA-bot / README.md

Upload folder using huggingface_hub

d8cba91 verified 8 months ago

1.55 kB

	---
	title: QA-bot
	app_file: app.py
	sdk: gradio
	sdk_version: 4.44.0
	---
	# PDF Question-Answering App using LangChain, Pinecone, and Mistral

	This project is a RAG app designed to perform question-answering (QA) on PDF documents. It uses the `LangChain` framework for embedding, `Pinecone` for vector storage, and the `mistral` language model for generating responses to user queries.

	## Features
	- PDF Handling: Load and split PDF files into manageable chunks for processing.
	- Embeddings: I am using the `SentenceTransformerEmbeddings` to create embeddings for document chunks.
	- Vector Storage: Pinecone is used to store document embeddings and efficiently retrieve relevant chunks based on user questions.
	- LLM Integration: I tried using LLMs locally using `Ollama`but due to lack of compute resources I used `mistral` for faster and better responses.
	- Environment Variables: Secrets like API keys are securely managed using `.env` files.

	## Requirements
	- Python 3.12
	- Run `pip install -r requirements.txt`
	- The following teck stack is used:
	- `langchain`
	- `pinecone` Make sure to sign up and create Pinecone API key
	- `Mistral API`


	## Setup

	### 1. Clone the Repository
	```bash
	git clone https://github.com/m-umar-j/RAG-APP
	cd RAG-APP
	```
	### 2. install the requirements using
	```bash
	pip install -r requirements.txt`
	```
	### 3. create .env file in your root directory and add pinecone API key

	``` makefile
	PINECONE_API_KEY=your-pinecone-api-key
	```
	### 4. modify paths

	`file_path = "/path/to/data.pdf"`