| # Hugging Face Spaces: FastAPI OpenAI-Compatible Backend | |
| This project is now ready to deploy as a Hugging Face Space using FastAPI and transformers (no vLLM, no llama-cpp/gguf). | |
| ## Features | |
| - OpenAI-compatible `/v1/chat/completions` endpoint | |
| - Multimodal support (text + image, if model supports) | |
| - Environment variable support via `.env` | |
| - Hugging Face Spaces compatible (CPU or T4/RTX GPU) | |
| ## Usage (Local) | |
| ```bash | |
| pip install -r requirements.txt | |
| python -m uvicorn backend_service:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ## Usage (Hugging Face Spaces) | |
| - Push this repo to your Hugging Face Space | |
| - Space will auto-launch with FastAPI backend | |
| - Use `/v1/chat/completions` endpoint for OpenAI-compatible clients | |
| ## Notes | |
| - Only transformers models are supported (no GGUF/llama-cpp, no vLLM) | |
| - Set your model in the `AI_MODEL` environment variable or edit `backend_service.py` | |
| - For secrets, use the Hugging Face Spaces Secrets UI or a `.env` file | |
| ## Example curl | |
| ```bash | |
| curl -X POST https://<your-space>.hf.space/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"model": "google/gemma-3n-E4B-it", "messages": [{"role": "user", "content": "Hello!"}]}' | |
| ``` | |
| --- | |
| For more, see Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces-sdks-docker | |
| # Fallback Logic | |
| If vLLM fails to start or respond, the backend will automatically fallback to the legacy backend. | |
| # Fine-tuning Gemma 3n E4B on MacBook M1 (Apple Silicon) with Unsloth | |
| This project supports local fine-tuning of the Gemma 3n E4B model using Unsloth, PEFT/LoRA, and export to GGUF Q4_K_XL for efficient inference. The workflow is optimized for Apple Silicon (M1/M2/M3) and avoids CUDA/bitsandbytes dependencies. | |
| ## Prerequisites | |
| - Python 3.10+ | |
| - macOS with Apple Silicon (M1/M2/M3) | |
| - PyTorch with MPS backend (install via `pip install torch`) | |
| - All dependencies in `requirements.txt` (install with `pip install -r requirements.txt`) | |
| ## Training Script Usage | |
| Run the training script with your dataset (JSON/JSONL or Hugging Face format): | |
| ```bash | |
| python training/train_gemma_unsloth.py \ | |
| --job-id myjob \ | |
| --output-dir training_runs/myjob \ | |
| --dataset sample_data/train.jsonl \ | |
| --prompt-field prompt --response-field response \ | |
| --epochs 1 --batch-size 1 --gradient-accumulation 8 \ | |
| --use-fp16 \ | |
| --grpo --cpt \ | |
| --export-gguf --gguf-out training_runs/myjob/adapter-gguf-q4_k_xl | |
| ``` | |
| **Flags:** | |
| - `--grpo`: Enable GRPO (if supported by Unsloth) | |
| - `--cpt`: Enable CPT (if supported by Unsloth) | |
| - `--export-gguf`: Export to GGUF Q4_K_XL after training | |
| - `--gguf-out`: Path to save GGUF file | |
| **Notes:** | |
| - On Mac, bitsandbytes/xformers are disabled automatically. | |
| - Training is slower than on CUDA GPUs; use small batch sizes and gradient accumulation. | |
| - If Unsloth's GGUF export is unavailable, follow the printed instructions to use llama.cpp's `convert-hf-to-gguf.py`. | |
| ## Troubleshooting | |
| - If you see errors about missing CUDA or bitsandbytes, ensure you are running on Apple Silicon and have the latest Unsloth/Transformers. | |
| - For memory errors, reduce `--batch-size` or `--cutoff-len`. | |
| - For best results, use datasets formatted to match the official Gemma 3n chat template. | |
| ## Example: Manual GGUF Export with llama.cpp | |
| If the script prints a message about manual conversion, run: | |
| ```bash | |
| python convert-hf-to-gguf.py --outtype q4_k_xl --outfile training_runs/myjob/adapter-gguf-q4_k_xl training_runs/myjob/adapter | |
| ``` | |
| ## References | |
| - [Unsloth Documentation](https://unsloth.ai/) | |
| - [Gemma 3n E4B Model Card](https://huggingface.co/unsloth/gemma-3n-E4B-it) | |
| - [llama.cpp GGUF Export Guide](https://github.com/ggerganov/llama.cpp) | |
| --- | |
| title: Multimodal AI Backend Service | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 8000 | |
| pinned: false | |
| --- | |
| # firstAI - Multimodal AI Backend π | |
| A powerful AI backend service with **multimodal capabilities** and **advanced deployment support** - supporting both text generation and image analysis using transformers pipelines. | |
| ## π Features | |
| ### π€ Configurable AI Models | |
| - **Default Text Model**: Microsoft DialoGPT-medium (deployment-friendly) | |
| - **Advanced Models**: Support for quantized models (Unsloth, 4-bit, GGUF) | |
| - **Environment Configuration**: Runtime model selection via environment variables | |
| - **Quantization Support**: Automatic 4-bit quantization with fallback mechanisms | |
| ### πΌοΈ Multimodal Support | |
| - Process text-only messages | |
| - Analyze images from URLs | |
| - Combined image + text conversations | |
| - OpenAI Vision API compatible format | |
| ### οΏ½ Production Ready | |
| - **Enhanced Deployment**: Multi-level fallback for quantized models | |
| - **Environment Flexibility**: Works in constrained deployment environments | |
| - **Error Resilience**: Comprehensive error handling with graceful degradation | |
| - FastAPI backend with automatic docs | |
| - Health checks and monitoring | |
| - PyTorch with MPS acceleration (Apple Silicon) | |
| ### π§ Model Configuration | |
| Configure models via environment variables: | |
| ```bash | |
| # Set custom text model (optional) | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| # Set custom vision model (optional) | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| # For private models (optional) | |
| export HF_TOKEN="your_huggingface_token" | |
| ``` | |
| **Supported Model Types:** | |
| - Standard models: `microsoft/DialoGPT-medium`, `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | |
| - Quantized models: `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | |
| - GGUF models: `unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF` | |
| ## π Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Start the Service | |
| ```bash | |
| python backend_service.py | |
| ``` | |
| ### 3. Test Multimodal Capabilities | |
| ```bash | |
| python test_final.py | |
| ``` | |
| The service will start on **http://localhost:8001** with both text and vision models loaded. | |
| ## π‘ Usage Examples | |
| ### Text-Only Chat | |
| ```bash | |
| curl -X POST http://localhost:8001/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "microsoft/DialoGPT-medium", | |
| "messages": [{"role": "user", "content": "Hello!"}] | |
| }' | |
| ``` | |
| ### Image Analysis | |
| ```bash | |
| curl -X POST http://localhost:8001/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "Salesforce/blip-image-captioning-base", | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "image", | |
| "url": "https://example.com/image.jpg" | |
| } | |
| ] | |
| } | |
| ] | |
| }' | |
| ``` | |
| ### Multimodal (Image + Text) | |
| ```bash | |
| curl -X POST http://localhost:8001/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "Salesforce/blip-image-captioning-base", | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "image", | |
| "url": "https://example.com/image.jpg" | |
| }, | |
| { | |
| "type": "text", | |
| "text": "What do you see in this image?" | |
| } | |
| ] | |
| } | |
| ] | |
| }' | |
| ``` | |
| ## π§ Technical Details | |
| ### Architecture | |
| - **FastAPI** web framework | |
| - **Transformers** pipeline for AI models | |
| - **PyTorch** backend with GPU/MPS support | |
| - **Pydantic** for request/response validation | |
| ### Models | |
| - **Text**: microsoft/DialoGPT-medium | |
| - **Vision**: Salesforce/blip-image-captioning-base | |
| ### API Endpoints | |
| - `GET /` - Service information | |
| - `GET /health` - Health check | |
| - `GET /v1/models` - List available models | |
| - `POST /v1/chat/completions` - Chat completions (text/multimodal) | |
| - `GET /docs` - Interactive API documentation | |
| ## π Deployment | |
| ### Environment Variables | |
| ```bash | |
| # Optional: Custom models | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| export HF_TOKEN="your_token_here" # For private models | |
| ``` | |
| ### Production Deployment | |
| The service includes enhanced deployment capabilities: | |
| - **Quantized Model Support**: Automatic handling of 4-bit and GGUF models | |
| - **Fallback Mechanisms**: Multi-level fallback for constrained environments | |
| - **Error Resilience**: Graceful degradation when quantization libraries unavailable | |
| ### Docker Deployment | |
| ```bash | |
| # Build and run with Docker | |
| docker build -t firstai . | |
| docker run -p 8000:8000 firstai | |
| ``` | |
| ### Testing Deployment | |
| ```bash | |
| # Test quantization detection and fallbacks | |
| python test_deployment_fallbacks.py | |
| # Test health endpoint | |
| curl http://localhost:8000/health | |
| ``` | |
| For comprehensive deployment guidance, see `DEPLOYMENT_ENHANCEMENTS.md`. | |
| ## π§ͺ Testing | |
| Run the comprehensive test suite: | |
| ```bash | |
| python test_final.py | |
| ``` | |
| Test individual components: | |
| ```bash | |
| python test_multimodal.py # Basic multimodal tests | |
| python test_pipeline.py # Pipeline compatibility | |
| ``` | |
| ## π¦ Dependencies | |
| Key packages: | |
| - `fastapi` - Web framework | |
| - `transformers` - AI model pipelines | |
| - `torch` - PyTorch backend | |
| - `Pillow` - Image processing | |
| - `accelerate` - Model acceleration | |
| - `requests` - HTTP client | |
| ## π― Integration Complete | |
| This project successfully integrates: | |
| β **Transformers image-text-to-text pipeline** | |
| β **OpenAI Vision API compatibility** | |
| β **Multimodal message processing** | |
| β **Production-ready FastAPI service** | |
| See `MULTIMODAL_INTEGRATION_COMPLETE.md` for detailed integration documentation. | |
| - PyTorch with MPS acceleration (Apple Silicon) AI Backend Service | |
| emoji: οΏ½ | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: fastapi | |
| sdk_version: 0.100.0 | |
| app_file: backend_service.py | |
| pinned: false | |
| --- | |
| # AI Backend Service π | |
| **Status: β CONVERSION COMPLETE!** | |
| Successfully converted from a non-functioning Gradio HuggingFace app to a production-ready FastAPI backend service with OpenAI-compatible API endpoints. | |
| ## Quick Start | |
| ### 1. Setup Environment | |
| ```bash | |
| # Activate the virtual environment | |
| source gradio_env/bin/activate | |
| # Install dependencies (already done) | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Start the Backend Service | |
| ```bash | |
| python backend_service.py --port 8000 --reload | |
| ``` | |
| ### 3. Test the API | |
| ```bash | |
| # Run comprehensive tests | |
| python test_api.py | |
| # Or try usage examples | |
| python usage_examples.py | |
| ``` | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| | ---------------------- | ------ | ----------------------------------- | | |
| | `/` | GET | Service information | | |
| | `/health` | GET | Health check | | |
| | `/v1/models` | GET | List available models | | |
| | `/v1/chat/completions` | POST | Chat completion (OpenAI compatible) | | |
| | `/v1/completions` | POST | Text completion | | |
| ## Example Usage | |
| ### Chat Completion | |
| ```bash | |
| curl -X POST http://localhost:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "microsoft/DialoGPT-medium", | |
| "messages": [ | |
| {"role": "user", "content": "Hello! How are you?"} | |
| ], | |
| "max_tokens": 150, | |
| "temperature": 0.7 | |
| }' | |
| ``` | |
| ### Streaming Chat | |
| ```bash | |
| curl -X POST http://localhost:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "microsoft/DialoGPT-medium", | |
| "messages": [ | |
| {"role": "user", "content": "Tell me a joke"} | |
| ], | |
| "stream": true | |
| }' | |
| ``` | |
| ## Files | |
| - **`app.py`** - Original Gradio ChatInterface (still functional) | |
| - **`backend_service.py`** - New FastAPI backend service β | |
| - **`test_api.py`** - Comprehensive API testing | |
| - **`usage_examples.py`** - Simple usage examples | |
| - **`requirements.txt`** - Updated dependencies | |
| - **`CONVERSION_COMPLETE.md`** - Detailed conversion documentation | |
| ## Features | |
| β **OpenAI-Compatible API** - Drop-in replacement for OpenAI API | |
| β **Async FastAPI** - High-performance async architecture | |
| β **Streaming Support** - Real-time response streaming | |
| β **Error Handling** - Robust error handling with fallbacks | |
| β **Production Ready** - CORS, logging, health checks | |
| β **Docker Ready** - Easy containerization | |
| β **Auto-reload** - Development-friendly auto-reload | |
| β **Type Safety** - Full type hints with Pydantic validation | |
| ## Service URLs | |
| - **Backend Service**: http://localhost:8000 | |
| - **API Documentation**: http://localhost:8000/docs | |
| - **OpenAPI Spec**: http://localhost:8000/openapi.json | |
| ## Model Information | |
| - **Current Model**: `microsoft/DialoGPT-medium` | |
| - **Type**: Conversational AI model | |
| - **Provider**: HuggingFace Inference API | |
| - **Capabilities**: Text generation, chat completion | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ | |
| β Client Request βββββΆβ FastAPI Backend βββββΆβ HuggingFace API β | |
| β (OpenAI format) β β (backend_service) β β (DialoGPT-medium) β | |
| βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββ | |
| β OpenAI Response β | |
| β (JSON/Streaming) β | |
| ββββββββββββββββββββββββ | |
| ``` | |
| ## Development | |
| The service includes: | |
| - **Auto-reload** for development | |
| - **Comprehensive logging** for debugging | |
| - **Type checking** for code quality | |
| - **Test suite** for reliability | |
| - **Error handling** for robustness | |
| ## Production Deployment | |
| Ready for production with: | |
| - **Environment variables** for configuration | |
| - **Health check endpoints** for monitoring | |
| - **CORS support** for web applications | |
| - **Docker compatibility** for containerization | |
| - **Structured logging** for observability | |
| --- | |
| **π Conversion Status: COMPLETE!** | |
| Successfully transformed from broken Gradio app to production-ready AI backend service. | |
| For detailed conversion documentation, see [`CONVERSION_COMPLETE.md`](CONVERSION_COMPLETE.md). | |
| # Gemma 3n GGUF FastAPI Backend (Hugging Face Space) | |
| This Space provides an OpenAI-compatible chat API for Gemma 3n GGUF models, powered by FastAPI. | |
| **Note:** On Hugging Face Spaces, the backend runs in `DEMO_MODE` (no model loaded) for demonstration and endpoint testing. For real inference, run locally with a GGUF model and llama-cpp-python. | |
| ## Endpoints | |
| - `/health` β Health check | |
| - `/v1/chat/completions` β OpenAI-style chat completions (returns demo response) | |
| - `/train/start` β Start a (demo) training job | |
| - `/train/status/{job_id}` β Check training job status | |
| - `/train/logs/{job_id}` β Get training logs | |
| ## Usage | |
| 1. **Clone this repo** or create a Hugging Face Space (type: FastAPI). | |
| 2. All dependencies are in `requirements.txt`. | |
| 3. The Space will start in demo mode (no model download required). | |
| ## Local Inference (with GGUF) | |
| To run with a real model locally: | |
| 1. Download a Gemma 3n GGUF model (e.g. from https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF). | |
| 2. Set `AI_MODEL` to the local path or repo. | |
| 3. Unset `DEMO_MODE`. | |
| 4. Run: | |
| ```bash | |
| pip install -r requirements.txt | |
| uvicorn gemma_gguf_backend:app --host 0.0.0.0 --port 8000 | |
| ``` | |
| ## License | |
| Apache 2.0 | |