Spaces:
Sleeping
Sleeping
File size: 3,284 Bytes
033798f f730649 033798f 26d1a81 3c4eeeb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
---
title: FAQ Chatbot Using RAG
emoji: 💬
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: "1.44.1"
app_file: app.py
pinned: false
---
# FAQ Chatbot Using RAG for Customer Support - Setup Instructions
Follow these steps to set up and run the e-commerce FAQ chatbot, optimized for hardware with 16-19GB RAM and 8-11GB GPU.
## Prerequisites
- Python 3.8 or higher
- CUDA-compatible GPU with 8-11GB VRAM
- 16-19GB RAM
- Internet connection (for downloading models and datasets)
## Step 1: Create Project Directory Structure
```bash
# Create the project directory
mkdir faq-rag-chatbot
cd faq-rag-chatbot
# Create the source directory
mkdir -p src data
```
## Step 2: Create Virtual Environment
```bash
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
```
## Step 3: Create Project Files
Create all the required files with the optimized code provided:
1. `requirements.txt`
2. `src/__init__.py`
3. `src/data_processing.py`
4. `src/embedding.py`
5. `src/llm_response.py`
6. `src/utils.py`
7. `app.py`
## Step 4: Install Dependencies
```bash
# Install required packages
pip install -r requirements.txt
# Additional dependency for memory monitoring
pip install psutil
```
## Step 5: Run the Application
```bash
# Make sure the virtual environment is activated
# Run the Streamlit app
streamlit run app.py
```
## Memory Optimization Notes
This implementation includes several optimizations for systems with 16-19GB RAM and 8-11GB GPU:
1. **Default to Smaller Models**: The app defaults to Phi-2 which works well on 8GB GPUs
2. **4-bit Quantization**: Uses 4-bit quantization for larger models like Mistral-7B
3. **Memory Offloading**: Offloads weights to CPU when not in use
4. **Batch Processing**: Processes embeddings in smaller batches
5. **Garbage Collection**: Aggressively frees memory after operations
6. **Response Length Limits**: Generates shorter responses to save memory
7. **CPU Embedding**: Keeps the embedding model on CPU to save GPU memory for the LLM
## Using the Chatbot
1. The application will automatically download the e-commerce FAQ dataset from Hugging Face
2. Choose an appropriate model based on your available GPU memory:
- For 8GB GPU: Use Phi-2 (default)
- For 10-11GB GPU: You can try Mistral-7B with 4-bit quantization
- For limited GPU or CPU-only: Use TinyLlama-1.1B
3. Type a question or select a sample question
4. The system will retrieve relevant FAQs and generate a response
5. Monitor memory usage in the sidebar
## Troubleshooting
- **Out of Memory Errors**: If you encounter CUDA out of memory errors, switch to a smaller model like TinyLlama-1.1B
- **Slow Response Times**: First response may be slow as the model loads, subsequent responses will be faster
- **Model Loading Issues**: If Mistral-7B fails to load, the system will automatically fall back to Phi-2
## Performance Considerations
- The embedding and retrieval components work efficiently even on limited hardware
- Response generation speed depends on the model size and available GPU memory
- For optimal performance with 8GB GPU, stick with Phi-2 model
- For faster responses with less accuracy, use TinyLlama-1.1B --> |