Qwen3 / readme.md
Semnykcz's picture
Upload 11 files
a2d424a verified

A newer version of the Gradio SDK is available: 5.46.0

Upgrade

AI Chat Application with Qwen Coder

This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.

Features

  • Qwen Coder 3 Integration: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
  • OPENAI API Compatibility: Implements OPENAI API endpoints for seamless integration
  • Streaming Responses: Real-time response streaming for interactive chat experience
  • Conversation History: Persistent conversation history management
  • Modern UI: Responsive design inspired by Perplexity AI with TailwindCSS
  • Dark/Light Mode: Support for both dark and light themes
  • Copy Responses: One-click copying of AI responses
  • Typing Indicators: Visual indicators for AI response generation
  • GPU Optimization: Full GPU optimization for maximum performance
  • Error Handling: Robust error handling with automatic connection recovery
  • Caching: Efficient caching mechanisms for improved performance

Project Structure

/
β”œβ”€β”€ app.py                 # Main application entry point
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ public/               # Frontend static files
β”‚   β”œβ”€β”€ index.html        # Main HTML file
β”‚   β”œβ”€β”€ styles.css        # TailwindCSS styles
β”‚   └── app.js            # JavaScript logic
└── utils/                # Utility modules
    β”œβ”€β”€ model_utils.py    # Model management utilities
    β”œβ”€β”€ conversation.py   # Conversation management
    └── api_compat.py     # OPENAI API compatibility

Requirements

  • Python 3.8+
  • GPU with CUDA support (recommended)
  • 32GB+ RAM (for optimal performance with Qwen Coder 3)

Installation

  1. Clone this repository:

    git clone <repository-url>
    cd <repository-name>
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the application:

    python app.py
    

Deployment to HuggingFace Spaces

  1. Create a new Space on HuggingFace:

  2. Upload files to your Space repository:

    • Upload all files from this repository
    • Make sure to include the requirements.txt file
  3. Configure the Space:

    • The Space will automatically detect and install dependencies from requirements.txt
    • The application will start automatically on port 7860
  4. Access your deployed application:

    • Once the build is complete, your application will be available at the provided URL

API Endpoints

OPENAI API Compatible Endpoint

POST /v1/chat/completions

Request format:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
  "max_tokens": 1024,
  "temperature": 0.7
}

Frontend Chat Endpoint

POST /chat

Request format:

{
  "message": "Hello!",
  "history": [
    {"role": "user", "content": "Previous message"},
    {"role": "assistant", "content": "Previous response"}
  ]
}

Customization

Model Configuration

You can customize the model behavior by modifying the parameters in utils/model_utils.py:

  • DEFAULT_MAX_TOKENS: Maximum tokens to generate
  • DEFAULT_TEMPERATURE: Sampling temperature

UI Customization

The UI can be customized by modifying:

  • public/styles.css: CSS styles with TailwindCSS
  • public/app.js: JavaScript logic
  • public/index.html: HTML structure

Troubleshooting

Common Issues

  1. Model Loading Errors:

    • Ensure you have sufficient RAM and GPU memory
    • Check that the model name is correct in utils/model_utils.py
  2. CUDA Out of Memory:

    • Reduce DEFAULT_MAX_TOKENS in utils/model_utils.py
    • Use a smaller model variant if available
  3. Dependency Installation Failures:

    • Check the HuggingFace Space logs for specific error messages
    • Ensure all dependencies are listed in requirements.txt

Performance Optimization

  1. GPU Usage:

    • The application automatically detects and uses CUDA if available
    • For CPU-only environments, performance will be significantly slower
  2. Caching:

    • Redis is used for caching if available
    • In-memory storage is used as fallback

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
  • HuggingFace for providing the platform
  • Gradio team for the web interface framework