Exodus: Bilingual FAQ Chatbot with Hybrid Model Support

This is a production-ready bilingual chatbot that combines Llama 3 (via Ollama) and Qwen2.5-7B-Instruct-AWQ for handling both English and Arabic queries. The system uses a hybrid approach with semantic search and fine-tuned responses.

Model Description

The system uses two main models:

  1. Llama 3 (via Ollama):

    • Used for simpler queries and general conversation
    • Managed through Ollama for efficient resource usage
    • Requires ~2GB system RAM
  2. Qwen2.5-7B-Instruct-AWQ:

    • Used for complex queries and specialized tasks
    • 4-bit quantized for efficiency
    • Requires ~8GB VRAM + ~4GB system RAM
    • Fine-tuned on FAQ data using QLoRA

Intended Use

This model is designed for:

  • Bilingual FAQ systems (English and Arabic)
  • Customer support automation
  • Documentation assistance
  • General conversational tasks

Out-of-Scope Uses

This model should not be used for:

  • Critical decision-making without human oversight
  • Medical, legal, or financial advice
  • Generation of harmful or inappropriate content

How to Use

1. Environment Setup

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Model Setup

Ollama Setup (for Llama 3):

# Install Ollama from https://ollama.ai
# Pull the Llama 3 model
ollama pull llama3:latest

Qwen Setup:

# The model will be downloaded automatically when first used
# Or manually download from HuggingFace:
huggingface-cli download Qwen/Qwen2.5-7B-Instruct-AWQ

3. Running the System

# Start the server
python main.py

# In a separate terminal, start the frontend
cd frontend
npm install
npm run dev

4. API Usage

import requests

# Switch between models
response = requests.post("http://localhost:8000/api/models/switch", 
                        json={"model_id": "llama3"})

# Send a query
response = requests.post("http://localhost:8000/api/chat", 
                        json={"message": "What is your return policy?",
                              "language": "en"})

Fine-tuning

To fine-tune the model on your own FAQ data:

  1. Prepare your data in JSONL format:
{"question": "What is your return policy?", "answer": "Our return policy allows..."}
  1. Run the fine-tuning pipeline:
cd Fine-tune
./run.sh

For more detailed fine-tuning options:

./run.sh --help

Performance and Limitations

Performance Metrics

  • Response time:
    • Llama 3 (via Ollama): ~1-2 seconds
    • Qwen2.5: ~2-3 seconds
  • Memory usage:
    • Llama 3: ~2GB RAM
    • Qwen2.5: ~8GB VRAM + ~4GB RAM
  • Accuracy on FAQ queries: >90%
  • Arabic language support accuracy: >85%

Limitations

  1. Resource Requirements:

    • Minimum 8GB VRAM for Qwen2.5
    • Stable internet for initial model downloads
  2. Language Support:

    • Primary support for English and Arabic
    • Other languages may work but are not optimized
  3. Response Time:

    • May be slower on CPU-only systems
    • Complex queries take longer to process

Training Details

Training Data

The model can be fine-tuned on:

  • FAQ datasets
  • Customer support conversations
  • Documentation Q&A pairs

Training Process

  1. Data Preparation:

    • Convert to JSONL format
    • Clean and preprocess text
    • Split into train/validation sets
  2. Fine-tuning:

    • Uses QLoRA for efficient training
    • 4-bit quantization
    • Parameter-Efficient Fine-Tuning (PEFT)
  3. Hyperparameters:

    • Learning rate: 2e-4
    • Batch size: 4
    • LoRA rank: 8
    • LoRA alpha: 16

Ethical Considerations

  1. Content Filtering:

    • Built-in profanity filtering
    • Content moderation for both English and Arabic
  2. Bias Mitigation:

    • Regular evaluation of responses
    • Configurable content filters
  3. Privacy:

    • No user data retention
    • Local model deployment option

Maintenance and Support

The project is actively maintained with:

  • Regular model updates
  • Security patches
  • Bug fixes
  • Community contributions

For support:

  1. Check the GitHub Issues
  2. Review the Documentation
  3. Join the community discussions

Citation

If you use this model in your research, please cite:

@software{exodus_chatbot,
  title = {Exodus: Bilingual FAQ Chatbot with Hybrid Model Support},
  year = {2024},
  url = {https://github.com/yazeedmshayekh2/Exodus},
  author = {Yazeed Mshayekh},
  version = {1.0.0}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Downloads last month
26
Safetensors
Model size
7.62B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yazeed-mshayekh/Exodus-Arabic-Model

Base model

Qwen/Qwen2.5-7B
Finetuned
(2346)
this model
Quantizations
2 models