Spaces:

cong182
/

firstAI

Sleeping

App Files Files Community

firstAI / MODEL_CONFIG.md

ndc8

update to use unsloth + mistral

172b424 about 1 month ago

preview code

raw

history blame

5.41 kB

🔧 Model Configuration Guide

The backend now supports configurable models via environment variables, making it easy to switch between different AI models without code changes.

📋 Environment Variables

Primary Configuration

# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"

# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"

🚀 Usage Examples

1. Use DeepSeek-R1 (Default)

# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py

2. Use DialoGPT (Faster, smaller)

# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py

3. Use Unsloth 4-bit Quantized Models

# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py

# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py

4. Use Other Popular Models

# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py

# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py

# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py

5. Use Different Vision Model

export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py

📝 Startup Script Examples

Development Mode (Fast startup)

#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py

Production Mode (Your preferred model)

#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py

Testing Mode (Lightweight)

#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py

🔍 Model Verification

After starting the backend, check which model is loaded:

curl http://localhost:8000/health

Response will show:

{
  "status": "healthy",
  "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
  "version": "1.0.0"
}

📊 Model Comparison

Model	Size	Speed	Quality	Use Case
`microsoft/DialoGPT-medium`	~355MB	⚡ Fast	Good	Development/Testing
`deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`	~16GB	🐌 Slow	⭐ Excellent	Production
`unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit`	~7GB	🚀 Medium	⭐ Excellent	Production (4-bit)
`HuggingFaceH4/zephyr-7b-beta`	~14GB	🐌 Slow	⭐ Excellent	Chat/Conversation
`codellama/CodeLlama-7b-Instruct-hf`	~13GB	🐌 Slow	⭐ Good	Code Generation

🛠️ Troubleshooting

Model Not Found

# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
    info = api.model_info('your-model-name')
    print(f'✅ Model exists: {info.id}')
except:
    print('❌ Model not found')
"

Memory Issues

# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium"  # ~355MB
# or
export AI_MODEL="distilgpt2"  # ~82MB

Authentication Issues

# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"

🎯 Quick Switch Commands

# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py

# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py

# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py

✅ Summary

Environment Variable: AI_MODEL controls the main text generation model
Default: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B (your original preference)
Alternative: microsoft/DialoGPT-medium (faster for development)
Vision Model: VISION_MODEL controls image processing model
No Code Changes: Switch models by changing environment variables only

Your original DeepSeek-R1 model is still the default - I simply made it configurable so you can easily switch when needed!