Qwen3 / readme.md
Semnykcz's picture
Upload 11 files
a2d424a verified
# AI Chat Application with Qwen Coder
This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.
## Features
- **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration
- **Streaming Responses**: Real-time response streaming for interactive chat experience
- **Conversation History**: Persistent conversation history management
- **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS
- **Dark/Light Mode**: Support for both dark and light themes
- **Copy Responses**: One-click copying of AI responses
- **Typing Indicators**: Visual indicators for AI response generation
- **GPU Optimization**: Full GPU optimization for maximum performance
- **Error Handling**: Robust error handling with automatic connection recovery
- **Caching**: Efficient caching mechanisms for improved performance
## Project Structure
```
/
β”œβ”€β”€ app.py # Main application entry point
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ public/ # Frontend static files
β”‚ β”œβ”€β”€ index.html # Main HTML file
β”‚ β”œβ”€β”€ styles.css # TailwindCSS styles
β”‚ └── app.js # JavaScript logic
└── utils/ # Utility modules
β”œβ”€β”€ model_utils.py # Model management utilities
β”œβ”€β”€ conversation.py # Conversation management
└── api_compat.py # OPENAI API compatibility
```
## Requirements
- Python 3.8+
- GPU with CUDA support (recommended)
- 32GB+ RAM (for optimal performance with Qwen Coder 3)
## Installation
1. Clone this repository:
```bash
git clone <repository-url>
cd <repository-name>
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
python app.py
```
## Deployment to HuggingFace Spaces
1. Create a new Space on HuggingFace:
- Go to https://huggingface.co/new-space
- Choose "Gradio" as the Space SDK
- Select a GPU hardware (recommended for Qwen Coder 3)
2. Upload files to your Space repository:
- Upload all files from this repository
- Make sure to include the `requirements.txt` file
3. Configure the Space:
- The Space will automatically detect and install dependencies from `requirements.txt`
- The application will start automatically on port 7860
4. Access your deployed application:
- Once the build is complete, your application will be available at the provided URL
## API Endpoints
### OPENAI API Compatible Endpoint
```
POST /v1/chat/completions
```
Request format:
```json
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
"max_tokens": 1024,
"temperature": 0.7
}
```
### Frontend Chat Endpoint
```
POST /chat
```
Request format:
```json
{
"message": "Hello!",
"history": [
{"role": "user", "content": "Previous message"},
{"role": "assistant", "content": "Previous response"}
]
}
```
## Customization
### Model Configuration
You can customize the model behavior by modifying the parameters in `utils/model_utils.py`:
- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate
- `DEFAULT_TEMPERATURE`: Sampling temperature
### UI Customization
The UI can be customized by modifying:
- `public/styles.css`: CSS styles with TailwindCSS
- `public/app.js`: JavaScript logic
- `public/index.html`: HTML structure
## Troubleshooting
### Common Issues
1. **Model Loading Errors**:
- Ensure you have sufficient RAM and GPU memory
- Check that the model name is correct in `utils/model_utils.py`
2. **CUDA Out of Memory**:
- Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py`
- Use a smaller model variant if available
3. **Dependency Installation Failures**:
- Check the HuggingFace Space logs for specific error messages
- Ensure all dependencies are listed in `requirements.txt`
### Performance Optimization
1. **GPU Usage**:
- The application automatically detects and uses CUDA if available
- For CPU-only environments, performance will be significantly slower
2. **Caching**:
- Redis is used for caching if available
- In-memory storage is used as fallback
## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- HuggingFace for providing the platform
- Gradio team for the web interface framework