|
# AI Chat Application with Qwen Coder |
|
|
|
This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features. |
|
|
|
## Features |
|
|
|
- **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model |
|
- **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration |
|
- **Streaming Responses**: Real-time response streaming for interactive chat experience |
|
- **Conversation History**: Persistent conversation history management |
|
- **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS |
|
- **Dark/Light Mode**: Support for both dark and light themes |
|
- **Copy Responses**: One-click copying of AI responses |
|
- **Typing Indicators**: Visual indicators for AI response generation |
|
- **GPU Optimization**: Full GPU optimization for maximum performance |
|
- **Error Handling**: Robust error handling with automatic connection recovery |
|
- **Caching**: Efficient caching mechanisms for improved performance |
|
|
|
## Project Structure |
|
|
|
``` |
|
/ |
|
βββ app.py # Main application entry point |
|
βββ requirements.txt # Python dependencies |
|
βββ README.md # This file |
|
βββ public/ # Frontend static files |
|
β βββ index.html # Main HTML file |
|
β βββ styles.css # TailwindCSS styles |
|
β βββ app.js # JavaScript logic |
|
βββ utils/ # Utility modules |
|
βββ model_utils.py # Model management utilities |
|
βββ conversation.py # Conversation management |
|
βββ api_compat.py # OPENAI API compatibility |
|
``` |
|
|
|
## Requirements |
|
|
|
- Python 3.8+ |
|
- GPU with CUDA support (recommended) |
|
- 32GB+ RAM (for optimal performance with Qwen Coder 3) |
|
|
|
## Installation |
|
|
|
1. Clone this repository: |
|
```bash |
|
git clone <repository-url> |
|
cd <repository-name> |
|
``` |
|
|
|
2. Install dependencies: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. Run the application: |
|
```bash |
|
python app.py |
|
``` |
|
|
|
## Deployment to HuggingFace Spaces |
|
|
|
1. Create a new Space on HuggingFace: |
|
- Go to https://huggingface.co/new-space |
|
- Choose "Gradio" as the Space SDK |
|
- Select a GPU hardware (recommended for Qwen Coder 3) |
|
|
|
2. Upload files to your Space repository: |
|
- Upload all files from this repository |
|
- Make sure to include the `requirements.txt` file |
|
|
|
3. Configure the Space: |
|
- The Space will automatically detect and install dependencies from `requirements.txt` |
|
- The application will start automatically on port 7860 |
|
|
|
4. Access your deployed application: |
|
- Once the build is complete, your application will be available at the provided URL |
|
|
|
## API Endpoints |
|
|
|
### OPENAI API Compatible Endpoint |
|
``` |
|
POST /v1/chat/completions |
|
``` |
|
|
|
Request format: |
|
```json |
|
{ |
|
"messages": [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": "Hello!"} |
|
], |
|
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct", |
|
"max_tokens": 1024, |
|
"temperature": 0.7 |
|
} |
|
``` |
|
|
|
### Frontend Chat Endpoint |
|
``` |
|
POST /chat |
|
``` |
|
|
|
Request format: |
|
```json |
|
{ |
|
"message": "Hello!", |
|
"history": [ |
|
{"role": "user", "content": "Previous message"}, |
|
{"role": "assistant", "content": "Previous response"} |
|
] |
|
} |
|
``` |
|
|
|
## Customization |
|
|
|
### Model Configuration |
|
You can customize the model behavior by modifying the parameters in `utils/model_utils.py`: |
|
- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate |
|
- `DEFAULT_TEMPERATURE`: Sampling temperature |
|
|
|
### UI Customization |
|
The UI can be customized by modifying: |
|
- `public/styles.css`: CSS styles with TailwindCSS |
|
- `public/app.js`: JavaScript logic |
|
- `public/index.html`: HTML structure |
|
|
|
## Troubleshooting |
|
|
|
### Common Issues |
|
|
|
1. **Model Loading Errors**: |
|
- Ensure you have sufficient RAM and GPU memory |
|
- Check that the model name is correct in `utils/model_utils.py` |
|
|
|
2. **CUDA Out of Memory**: |
|
- Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py` |
|
- Use a smaller model variant if available |
|
|
|
3. **Dependency Installation Failures**: |
|
- Check the HuggingFace Space logs for specific error messages |
|
- Ensure all dependencies are listed in `requirements.txt` |
|
|
|
### Performance Optimization |
|
|
|
1. **GPU Usage**: |
|
- The application automatically detects and uses CUDA if available |
|
- For CPU-only environments, performance will be significantly slower |
|
|
|
2. **Caching**: |
|
- Redis is used for caching if available |
|
- In-memory storage is used as fallback |
|
|
|
## Contributing |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Commit your changes |
|
4. Push to the branch |
|
5. Create a Pull Request |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
## Acknowledgments |
|
|
|
- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model |
|
- HuggingFace for providing the platform |
|
- Gradio team for the web interface framework |