Spaces:
Build error
Build error
title: LLaMA-Omni | |
emoji: π¦π§ | |
colorFrom: indigo | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 3.50.2 | |
app_file: app_gradio_spaces.py | |
pinned: false | |
# π¦π§ LLaMA-Omni: Seamless Speech Interaction with Large Language Models | |
This is a Gradio deployment of [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni), a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions. | |
## π‘ Highlights | |
* πͺ **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.** | |
* π **Low-latency speech interaction with a latency as low as 226ms.** | |
* π§ **Simultaneous generation of both text and speech responses.** | |
## π Prerequisites | |
- Python 3.10+ | |
- PyTorch 2.0+ | |
- CUDA-compatible GPU (for optimal performance) | |
## π οΈ Setup | |
1. Clone this repository: | |
```bash | |
git clone https://github.com/your-username/llama-omni.git | |
cd llama-omni | |
``` | |
2. Create a virtual environment and install dependencies: | |
```bash | |
conda create -n llama-omni python=3.10 | |
conda activate llama-omni | |
pip install -e . | |
``` | |
3. Install fairseq: | |
```bash | |
pip install git+https://github.com/pytorch/fairseq.git | |
``` | |
4. Install optional dependencies (if not on Mac M1/M2): | |
```bash | |
# Only run this if not on Mac with Apple Silicon | |
pip install flash-attn | |
``` | |
## π³ Docker Deployment | |
We provide Docker support for easy deployment without worrying about dependencies: | |
1. Make sure Docker and Docker Compose are installed on your system | |
2. Build and run the container: | |
```bash | |
# Using the provided shell script | |
./run_docker.sh | |
# Or manually with docker-compose | |
docker-compose up --build | |
``` | |
3. Access the application at http://localhost:7860 | |
The Docker container will automatically: | |
- Install all required dependencies | |
- Download the necessary model files | |
- Start the application | |
### GPU Support | |
The Docker setup includes NVIDIA GPU support. Make sure you have: | |
- NVIDIA drivers installed on your host | |
- NVIDIA Container Toolkit installed (for GPU passthrough) | |
## π Gradio Spaces Deployment | |
To deploy on Gradio Spaces: | |
1. Create a new Gradio Space | |
2. Connect this GitHub repository | |
3. Set the environment requirements (Python 3.10) | |
4. Deploy! | |
The app will automatically: | |
- Download the required models (Whisper, LLaMA-Omni, vocoder) | |
- Start the controller | |
- Start the model worker | |
- Launch the web interface | |
## π₯οΈ Local Usage | |
If you want to run the application locally without Docker: | |
```bash | |
python app.py | |
``` | |
This will: | |
1. Start the controller | |
2. Start a model worker that loads LLaMA-Omni | |
3. Launch a web interface | |
You can then access the interface at: http://localhost:8000 | |
## π Example Usage | |
### Speech-to-Speech | |
1. Select the "Speech Input" tab | |
2. Record or upload audio | |
3. Click "Submit" | |
4. Receive both text and speech responses | |
### Text-to-Speech | |
1. Select the "Text Input" tab | |
2. Type your message | |
3. Click "Submit" | |
4. Receive both text and speech responses | |
## π Development | |
To contribute to this project: | |
1. Fork the repository | |
2. Make your changes | |
3. Submit a pull request | |
## π LICENSE | |
This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may **NOT** be used for commercial purposes. | |
Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng. |