Spaces:

Semnykcz
/

Qwen3

Paused

App Files Files Community

Qwen3 / readme.md

Semnykcz

Upload 11 files

a2d424a verified 27 days ago

preview code

raw

history blame contribute delete

4.94 kB

	# AI Chat Application with Qwen Coder

	This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.

	## Features

	- Qwen Coder 3 Integration: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
	- OPENAI API Compatibility: Implements OPENAI API endpoints for seamless integration
	- Streaming Responses: Real-time response streaming for interactive chat experience
	- Conversation History: Persistent conversation history management
	- Modern UI: Responsive design inspired by Perplexity AI with TailwindCSS
	- Dark/Light Mode: Support for both dark and light themes
	- Copy Responses: One-click copying of AI responses
	- Typing Indicators: Visual indicators for AI response generation
	- GPU Optimization: Full GPU optimization for maximum performance
	- Error Handling: Robust error handling with automatic connection recovery
	- Caching: Efficient caching mechanisms for improved performance

	## Project Structure

	```
	/
	├── app.py # Main application entry point
	├── requirements.txt # Python dependencies
	├── README.md # This file
	├── public/ # Frontend static files
	│ ├── index.html # Main HTML file
	│ ├── styles.css # TailwindCSS styles
	│ └── app.js # JavaScript logic
	└── utils/ # Utility modules
	├── model_utils.py # Model management utilities
	├── conversation.py # Conversation management
	└── api_compat.py # OPENAI API compatibility
	```

	## Requirements

	- Python 3.8+
	- GPU with CUDA support (recommended)
	- 32GB+ RAM (for optimal performance with Qwen Coder 3)

	## Installation

	1. Clone this repository:
	```bash
	git clone <repository-url>
	cd <repository-name>
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	python app.py
	```

	## Deployment to HuggingFace Spaces

	1. Create a new Space on HuggingFace:
	- Go to https://huggingface.co/new-space
	- Choose "Gradio" as the Space SDK
	- Select a GPU hardware (recommended for Qwen Coder 3)

	2. Upload files to your Space repository:
	- Upload all files from this repository
	- Make sure to include the `requirements.txt` file

	3. Configure the Space:
	- The Space will automatically detect and install dependencies from `requirements.txt`
	- The application will start automatically on port 7860

	4. Access your deployed application:
	- Once the build is complete, your application will be available at the provided URL

	## API Endpoints

	### OPENAI API Compatible Endpoint
	```
	POST /v1/chat/completions
	```

	Request format:
	```json
	{
	"messages": [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "Hello!"}
	],
	"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
	"max_tokens": 1024,
	"temperature": 0.7
	}
	```

	### Frontend Chat Endpoint
	```
	POST /chat
	```

	Request format:
	```json
	{
	"message": "Hello!",
	"history": [
	{"role": "user", "content": "Previous message"},
	{"role": "assistant", "content": "Previous response"}
	]
	}
	```

	## Customization

	### Model Configuration
	You can customize the model behavior by modifying the parameters in `utils/model_utils.py`:
	- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate
	- `DEFAULT_TEMPERATURE`: Sampling temperature

	### UI Customization
	The UI can be customized by modifying:
	- `public/styles.css`: CSS styles with TailwindCSS
	- `public/app.js`: JavaScript logic
	- `public/index.html`: HTML structure

	## Troubleshooting

	### Common Issues

	1. Model Loading Errors:
	- Ensure you have sufficient RAM and GPU memory
	- Check that the model name is correct in `utils/model_utils.py`

	2. CUDA Out of Memory:
	- Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py`
	- Use a smaller model variant if available

	3. Dependency Installation Failures:
	- Check the HuggingFace Space logs for specific error messages
	- Ensure all dependencies are listed in `requirements.txt`

	### Performance Optimization

	1. GPU Usage:
	- The application automatically detects and uses CUDA if available
	- For CPU-only environments, performance will be significantly slower

	2. Caching:
	- Redis is used for caching if available
	- In-memory storage is used as fallback

	## Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Commit your changes
	4. Push to the branch
	5. Create a Pull Request

	## License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## Acknowledgments

	- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
	- HuggingFace for providing the platform
	- Gradio team for the web interface framework