--- title: FAQ Chatbot Using RAG emoji: 💬 colorFrom: blue colorTo: indigo sdk: streamlit sdk_version: "1.44.1" app_file: app.py pinned: false --- # FAQ Chatbot Using RAG for Customer Support - Setup Instructions Follow these steps to set up and run the e-commerce FAQ chatbot, optimized for hardware with 16-19GB RAM and 8-11GB GPU. ## Prerequisites - Python 3.8 or higher - CUDA-compatible GPU with 8-11GB VRAM - 16-19GB RAM - Internet connection (for downloading models and datasets) ## Step 1: Create Project Directory Structure ```bash # Create the project directory mkdir faq-rag-chatbot cd faq-rag-chatbot # Create the source directory mkdir -p src data ``` ## Step 2: Create Virtual Environment ```bash # Create a virtual environment python -m venv venv # Activate the virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate ``` ## Step 3: Create Project Files Create all the required files with the optimized code provided: 1. `requirements.txt` 2. `src/__init__.py` 3. `src/data_processing.py` 4. `src/embedding.py` 5. `src/llm_response.py` 6. `src/utils.py` 7. `app.py` ## Step 4: Install Dependencies ```bash # Install required packages pip install -r requirements.txt # Additional dependency for memory monitoring pip install psutil ``` ## Step 5: Run the Application ```bash # Make sure the virtual environment is activated # Run the Streamlit app streamlit run app.py ``` ## Memory Optimization Notes This implementation includes several optimizations for systems with 16-19GB RAM and 8-11GB GPU: 1. **Default to Smaller Models**: The app defaults to Phi-2 which works well on 8GB GPUs 2. **4-bit Quantization**: Uses 4-bit quantization for larger models like Mistral-7B 3. **Memory Offloading**: Offloads weights to CPU when not in use 4. **Batch Processing**: Processes embeddings in smaller batches 5. **Garbage Collection**: Aggressively frees memory after operations 6. **Response Length Limits**: Generates shorter responses to save memory 7. **CPU Embedding**: Keeps the embedding model on CPU to save GPU memory for the LLM ## Using the Chatbot 1. The application will automatically download the e-commerce FAQ dataset from Hugging Face 2. Choose an appropriate model based on your available GPU memory: - For 8GB GPU: Use Phi-2 (default) - For 10-11GB GPU: You can try Mistral-7B with 4-bit quantization - For limited GPU or CPU-only: Use TinyLlama-1.1B 3. Type a question or select a sample question 4. The system will retrieve relevant FAQs and generate a response 5. Monitor memory usage in the sidebar ## Troubleshooting - **Out of Memory Errors**: If you encounter CUDA out of memory errors, switch to a smaller model like TinyLlama-1.1B - **Slow Response Times**: First response may be slow as the model loads, subsequent responses will be faster - **Model Loading Issues**: If Mistral-7B fails to load, the system will automatically fall back to Phi-2 ## Performance Considerations - The embedding and retrieval components work efficiently even on limited hardware - Response generation speed depends on the model size and available GPU memory - For optimal performance with 8GB GPU, stick with Phi-2 model - For faster responses with less accuracy, use TinyLlama-1.1B -->