Spaces:

Semnykcz
/

Qwen3

Paused

App Files Files Community

Qwen3 / readme.md

Semnykcz

Upload 11 files

a2d424a verified 27 days ago

preview code

raw

history blame contribute delete

4.94 kB

A newer version of the Gradio SDK is available: 5.46.0

Upgrade

AI Chat Application with Qwen Coder

This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.

Features

Qwen Coder 3 Integration: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
OPENAI API Compatibility: Implements OPENAI API endpoints for seamless integration
Streaming Responses: Real-time response streaming for interactive chat experience
Conversation History: Persistent conversation history management
Modern UI: Responsive design inspired by Perplexity AI with TailwindCSS
Dark/Light Mode: Support for both dark and light themes
Copy Responses: One-click copying of AI responses
Typing Indicators: Visual indicators for AI response generation
GPU Optimization: Full GPU optimization for maximum performance
Error Handling: Robust error handling with automatic connection recovery
Caching: Efficient caching mechanisms for improved performance

Project Structure

/
├── app.py                 # Main application entry point
├── requirements.txt       # Python dependencies
├── README.md             # This file
├── public/               # Frontend static files
│   ├── index.html        # Main HTML file
│   ├── styles.css        # TailwindCSS styles
│   └── app.js            # JavaScript logic
└── utils/                # Utility modules
    ├── model_utils.py    # Model management utilities
    ├── conversation.py   # Conversation management
    └── api_compat.py     # OPENAI API compatibility

Requirements

Python 3.8+
GPU with CUDA support (recommended)
32GB+ RAM (for optimal performance with Qwen Coder 3)

Installation

Clone this repository:

git clone <repository-url>
cd <repository-name>

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```

Deployment to HuggingFace Spaces

Create a new Space on HuggingFace:
- Go to https://huggingface.co/new-space
- Choose "Gradio" as the Space SDK
- Select a GPU hardware (recommended for Qwen Coder 3)
Upload files to your Space repository:
- Upload all files from this repository
- Make sure to include the requirements.txt file
Configure the Space:
- The Space will automatically detect and install dependencies from requirements.txt
- The application will start automatically on port 7860
Access your deployed application:
- Once the build is complete, your application will be available at the provided URL

API Endpoints

OPENAI API Compatible Endpoint

POST /v1/chat/completions

Request format:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
  "max_tokens": 1024,
  "temperature": 0.7
}

Frontend Chat Endpoint

POST /chat

Request format:

{
  "message": "Hello!",
  "history": [
    {"role": "user", "content": "Previous message"},
    {"role": "assistant", "content": "Previous response"}
  ]
}

Customization

Model Configuration

You can customize the model behavior by modifying the parameters in utils/model_utils.py:

DEFAULT_MAX_TOKENS: Maximum tokens to generate
DEFAULT_TEMPERATURE: Sampling temperature

UI Customization

The UI can be customized by modifying:

public/styles.css: CSS styles with TailwindCSS
public/app.js: JavaScript logic
public/index.html: HTML structure

Troubleshooting

Common Issues

Model Loading Errors:
- Ensure you have sufficient RAM and GPU memory
- Check that the model name is correct in utils/model_utils.py
CUDA Out of Memory:
- Reduce DEFAULT_MAX_TOKENS in utils/model_utils.py
- Use a smaller model variant if available
Dependency Installation Failures:
- Check the HuggingFace Space logs for specific error messages
- Ensure all dependencies are listed in requirements.txt

Performance Optimization

GPU Usage:
- The application automatically detects and uses CUDA if available
- For CPU-only environments, performance will be significantly slower
Caching:
- Redis is used for caching if available
- In-memory storage is used as fallback

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
HuggingFace for providing the platform
Gradio team for the web interface framework