Spaces:
Sleeping
Sleeping
title: FAQ Chatbot Using RAG | |
emoji: 💬 | |
colorFrom: blue | |
colorTo: indigo | |
sdk: streamlit | |
sdk_version: "1.44.1" | |
app_file: app.py | |
pinned: false | |
# FAQ Chatbot Using RAG for Customer Support - Setup Instructions | |
Follow these steps to set up and run the e-commerce FAQ chatbot, optimized for hardware with 16-19GB RAM and 8-11GB GPU. | |
## Prerequisites | |
- Python 3.8 or higher | |
- CUDA-compatible GPU with 8-11GB VRAM | |
- 16-19GB RAM | |
- Internet connection (for downloading models and datasets) | |
## Step 1: Create Project Directory Structure | |
```bash | |
# Create the project directory | |
mkdir faq-rag-chatbot | |
cd faq-rag-chatbot | |
# Create the source directory | |
mkdir -p src data | |
``` | |
## Step 2: Create Virtual Environment | |
```bash | |
# Create a virtual environment | |
python -m venv venv | |
# Activate the virtual environment | |
# On Windows: | |
venv\Scripts\activate | |
# On macOS/Linux: | |
source venv/bin/activate | |
``` | |
## Step 3: Create Project Files | |
Create all the required files with the optimized code provided: | |
1. `requirements.txt` | |
2. `src/__init__.py` | |
3. `src/data_processing.py` | |
4. `src/embedding.py` | |
5. `src/llm_response.py` | |
6. `src/utils.py` | |
7. `app.py` | |
## Step 4: Install Dependencies | |
```bash | |
# Install required packages | |
pip install -r requirements.txt | |
# Additional dependency for memory monitoring | |
pip install psutil | |
``` | |
## Step 5: Run the Application | |
```bash | |
# Make sure the virtual environment is activated | |
# Run the Streamlit app | |
streamlit run app.py | |
``` | |
## Memory Optimization Notes | |
This implementation includes several optimizations for systems with 16-19GB RAM and 8-11GB GPU: | |
1. **Default to Smaller Models**: The app defaults to Phi-2 which works well on 8GB GPUs | |
2. **4-bit Quantization**: Uses 4-bit quantization for larger models like Mistral-7B | |
3. **Memory Offloading**: Offloads weights to CPU when not in use | |
4. **Batch Processing**: Processes embeddings in smaller batches | |
5. **Garbage Collection**: Aggressively frees memory after operations | |
6. **Response Length Limits**: Generates shorter responses to save memory | |
7. **CPU Embedding**: Keeps the embedding model on CPU to save GPU memory for the LLM | |
## Using the Chatbot | |
1. The application will automatically download the e-commerce FAQ dataset from Hugging Face | |
2. Choose an appropriate model based on your available GPU memory: | |
- For 8GB GPU: Use Phi-2 (default) | |
- For 10-11GB GPU: You can try Mistral-7B with 4-bit quantization | |
- For limited GPU or CPU-only: Use TinyLlama-1.1B | |
3. Type a question or select a sample question | |
4. The system will retrieve relevant FAQs and generate a response | |
5. Monitor memory usage in the sidebar | |
## Troubleshooting | |
- **Out of Memory Errors**: If you encounter CUDA out of memory errors, switch to a smaller model like TinyLlama-1.1B | |
- **Slow Response Times**: First response may be slow as the model loads, subsequent responses will be faster | |
- **Model Loading Issues**: If Mistral-7B fails to load, the system will automatically fall back to Phi-2 | |
## Performance Considerations | |
- The embedding and retrieval components work efficiently even on limited hardware | |
- Response generation speed depends on the model size and available GPU memory | |
- For optimal performance with 8GB GPU, stick with Phi-2 model | |
- For faster responses with less accuracy, use TinyLlama-1.1B --> |