AI Chat Application for HuggingFace Spaces
A fully functional AI chat application for HuggingFace Spaces integrating Qwen Coder 3 with advanced OPENAI API compatibility features.
Features
- Integration with Qwen/Qwen3-Coder-30B-A3B-Instruct model
- Advanced OPENAI API compatibility
- Professional web interface replicating Perplexity AI design
- Responsive layout with TailwindCSS styling
- Dark/light mode support
- Real-time streaming responses
- Conversation history management
- Copy response functionality
- Typing indicators
- Full GPU optimization
- Robust error handling and automatic connection recovery
- Caching mechanisms
- Ready for immediate deployment on HuggingFace Spaces
Technology Stack
- Backend: Python, Gradio, FastAPI, Transformers, PyTorch
- Frontend: TailwindCSS, JavaScript, HTML5
- Infrastructure: Redis for caching, HuggingFace Spaces deployment
Requirements
- Python 3.8+
- GPU with at least 24GB VRAM (for Qwen/Qwen3-Coder-30B-A3B-Instruct model)
- Redis server (optional, for conversation caching)
Installation
Clone this repository:
git clone <repository-url> cd ai-chat-appInstall dependencies:
pip install -r requirements.txtRun the application:
python app.py
Usage
Web Interface
The application provides a web interface accessible at http://localhost:7860 when running locally. The interface features:
- Chat interface similar to Perplexity AI
- Dark/light mode toggle
- Conversation history sidebar
- Copy buttons for responses
- Typing indicators during response generation
API Endpoints
The application exposes OPENAI API compatible endpoints:
POST /v1/chat/completions- Chat completion endpoint
Example request:
{
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
"max_tokens": 1024,
"temperature": 0.7
}
Deployment to HuggingFace Spaces
Create a new Space on HuggingFace with the following configuration:
- SDK: Gradio
- Hardware: GPU (recommended)
Upload all files to your Space repository
The application will automatically start and be accessible through your Space URL
Configuration
The application can be configured through environment variables:
MODEL_NAME: The HuggingFace model identifier (default: Qwen/Qwen3-Coder-30B-A3B-Instruct)MAX_TOKENS: Default maximum tokens for responses (default: 1024)TEMPERATURE: Default temperature for generation (default: 0.7)REDIS_URL: Redis connection URL for caching (optional)
Troubleshooting
GPU Memory Issues
If you encounter GPU memory issues:
- Ensure your GPU has at least 24GB VRAM
- Try reducing the
max_tokensparameter - Use quantization techniques for model loading
Model Loading Errors
If the model fails to load:
- Check your internet connection
- Ensure you have sufficient disk space
- Verify the model identifier is correct
Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
License
This project is licensed under the MIT License - see the LICENSE file for details.