pdf-QA-bot / README.md
umar-100's picture
Upload folder using huggingface_hub
d8cba91 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: QA-bot
app_file: app.py
sdk: gradio
sdk_version: 4.44.0

PDF Question-Answering App using LangChain, Pinecone, and Mistral

This project is a RAG app designed to perform question-answering (QA) on PDF documents. It uses the LangChain framework for embedding, Pinecone for vector storage, and the mistral language model for generating responses to user queries.

Features

  • PDF Handling: Load and split PDF files into manageable chunks for processing.
  • Embeddings: I am using the SentenceTransformerEmbeddings to create embeddings for document chunks.
  • Vector Storage: Pinecone is used to store document embeddings and efficiently retrieve relevant chunks based on user questions.
  • LLM Integration: I tried using LLMs locally using Ollamabut due to lack of compute resources I used mistral for faster and better responses.
  • Environment Variables: Secrets like API keys are securely managed using .env files.

Requirements

  • Python 3.12
  • Run pip install -r requirements.txt
  • The following teck stack is used:
    • langchain
    • pinecone Make sure to sign up and create Pinecone API key
    • Mistral API

Setup

1. Clone the Repository

git clone https://github.com/m-umar-j/RAG-APP
cd RAG-APP

2. install the requirements using

pip install -r requirements.txt`

3. create .env file in your root directory and add pinecone API key

PINECONE_API_KEY=your-pinecone-api-key

4. modify paths

file_path = "/path/to/data.pdf"