File size: 3,907 Bytes
67f5e62
 
 
 
 
 
 
 
 
 
 
 
1f79b12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02ed408
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: Mistral
emoji: 
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: apache-2.0
---

# AI-powered Web Search and PDF Chat Assistant

This application is a versatile AI-powered assistant that combines web search capabilities with PDF document analysis. It provides an interactive chat interface for users to ask questions, search the web, and analyze uploaded PDF documents.

## Features

- Web search functionality
- PDF document upload and analysis
- Chat interface for asking questions
- Multiple language models support (including Mistral, Mixtral, and Llama)
- Temperature and API call adjustments for fine-tuned responses
- Document management (upload, delete, refresh)
- Entity-specific summary generation

## Requirements

- Python 3.7+
- Gradio
- Hugging Face Transformers
- FAISS
- DuckDuckGo Search
- LangChain
- Llama Parse
- Pydantic

## Installation

1. Clone the repository
2. Install the required dependencies:


# AI-powered Web Search and PDF Chat Assistant

This project combines the power of large language models with web search capabilities and PDF document analysis to create a versatile chat assistant. Users can interact with their uploaded PDF documents or leverage web search to get informative responses to their queries.

## Features

- **PDF Document Chat**: Upload and interact with multiple PDF documents.
- **Web Search Integration**: Option to use web search for answering queries.
- **Multiple AI Models**: Choose from a selection of powerful language models.
- **Customizable Responses**: Adjust temperature and API call settings for fine-tuned outputs.
- **User-friendly Interface**: Built with Gradio for an intuitive chat experience.
- **Document Selection**: Choose which uploaded documents to include in your queries.

## How It Works

1. **Document Processing**: 
   - Upload PDF documents using either PyPDF or LlamaParse.
   - Documents are processed and stored in a FAISS vector database for efficient retrieval.

2. **Embedding**: 
   - Utilizes HuggingFace embeddings (default: 'sentence-transformers/all-mpnet-base-v2') for document indexing and query matching.

3. **Query Processing**:
   - For PDF queries, relevant document sections are retrieved from the FAISS database.
   - For web searches, results are fetched using the DuckDuckGo search API.

4. **Response Generation**:
   - Queries are processed using the selected AI model (options include Mistral, Mixtral, and others).
   - Responses are generated based on the retrieved context (from PDFs or web search).

5. **User Interaction**:
   - Users can chat with the AI, asking questions about uploaded documents or general queries.
   - The interface allows for adjusting model parameters and switching between PDF and web search modes.

## Setup and Usage

1. Install the required dependencies (list of dependencies to be added).
2. Set up the necessary API keys and tokens in your environment variables.
3. Run the main script to launch the Gradio interface.
4. Upload PDF documents using the file input at the top of the interface.
5. Select documents to query using the checkboxes.
6. Toggle between PDF chat and web search modes as needed.
7. Adjust temperature and number of API calls to fine-tune responses.
8. Start chatting and asking questions!

## Models

The project supports multiple AI models, including:
- mistralai/Mistral-7B-Instruct-v0.3
- mistralai/Mixtral-8x7B-Instruct-v0.1
- meta/llama-3.1-8b-instruct
- mistralai/Mistral-Nemo-Instruct-2407

## Future Improvements

- Integration of more embedding models for improved performance.
- Enhanced PDF parsing capabilities.
- Support for additional file formats beyond PDF.
- Improved caching for faster response times.

## Contribution

Contributions to this project are welcome! Please feel free to submit issues or pull requests on the project's GitHub repository.