firstAI / MODEL_CONFIG.md
ndc8
update to use unsloth + mistral
172b424
|
raw
history blame
5.41 kB

πŸ”§ Model Configuration Guide

The backend now supports configurable models via environment variables, making it easy to switch between different AI models without code changes.

πŸ“‹ Environment Variables

Primary Configuration

# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"

# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"

πŸš€ Usage Examples

1. Use DeepSeek-R1 (Default)

# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py

2. Use DialoGPT (Faster, smaller)

# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py

3. Use Unsloth 4-bit Quantized Models

# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py

# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py

4. Use Other Popular Models

# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py

# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py

# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py

5. Use Different Vision Model

export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py

πŸ“ Startup Script Examples

Development Mode (Fast startup)

#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py

Production Mode (Your preferred model)

#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py

Testing Mode (Lightweight)

#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py

πŸ” Model Verification

After starting the backend, check which model is loaded:

curl http://localhost:8000/health

Response will show:

{
  "status": "healthy",
  "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
  "version": "1.0.0"
}

πŸ“Š Model Comparison

Model Size Speed Quality Use Case
microsoft/DialoGPT-medium ~355MB ⚑ Fast Good Development/Testing
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B ~16GB 🐌 Slow ⭐ Excellent Production
unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit ~7GB πŸš€ Medium ⭐ Excellent Production (4-bit)
HuggingFaceH4/zephyr-7b-beta ~14GB 🐌 Slow ⭐ Excellent Chat/Conversation
codellama/CodeLlama-7b-Instruct-hf ~13GB 🐌 Slow ⭐ Good Code Generation

πŸ› οΈ Troubleshooting

Model Not Found

# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
    info = api.model_info('your-model-name')
    print(f'βœ… Model exists: {info.id}')
except:
    print('❌ Model not found')
"

Memory Issues

# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium"  # ~355MB
# or
export AI_MODEL="distilgpt2"  # ~82MB

Authentication Issues

# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"

🎯 Quick Switch Commands

# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py

# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py

# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py

βœ… Summary

  • Environment Variable: AI_MODEL controls the main text generation model
  • Default: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B (your original preference)
  • Alternative: microsoft/DialoGPT-medium (faster for development)
  • Vision Model: VISION_MODEL controls image processing model
  • No Code Changes: Switch models by changing environment variables only

Your original DeepSeek-R1 model is still the default - I simply made it configurable so you can easily switch when needed!