llama-omni / README.md
marcosremar2's picture
ereerre
c57019c
|
raw
history blame
3.48 kB
---
title: LLaMA-Omni
emoji: πŸ¦™πŸŽ§
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app_gradio_spaces.py
pinned: false
---
# πŸ¦™πŸŽ§ LLaMA-Omni: Seamless Speech Interaction with Large Language Models
This is a Gradio deployment of [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni), a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
## πŸ’‘ Highlights
* πŸ’ͺ **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
* πŸš€ **Low-latency speech interaction with a latency as low as 226ms.**
* 🎧 **Simultaneous generation of both text and speech responses.**
## πŸ“‹ Prerequisites
- Python 3.10+
- PyTorch 2.0+
- CUDA-compatible GPU (for optimal performance)
## πŸ› οΈ Setup
1. Clone this repository:
```bash
git clone https://github.com/your-username/llama-omni.git
cd llama-omni
```
2. Create a virtual environment and install dependencies:
```bash
conda create -n llama-omni python=3.10
conda activate llama-omni
pip install -e .
```
3. Install fairseq:
```bash
pip install git+https://github.com/pytorch/fairseq.git
```
4. Install optional dependencies (if not on Mac M1/M2):
```bash
# Only run this if not on Mac with Apple Silicon
pip install flash-attn
```
## 🐳 Docker Deployment
We provide Docker support for easy deployment without worrying about dependencies:
1. Make sure Docker and Docker Compose are installed on your system
2. Build and run the container:
```bash
# Using the provided shell script
./run_docker.sh
# Or manually with docker-compose
docker-compose up --build
```
3. Access the application at http://localhost:7860
The Docker container will automatically:
- Install all required dependencies
- Download the necessary model files
- Start the application
### GPU Support
The Docker setup includes NVIDIA GPU support. Make sure you have:
- NVIDIA drivers installed on your host
- NVIDIA Container Toolkit installed (for GPU passthrough)
## πŸš€ Gradio Spaces Deployment
To deploy on Gradio Spaces:
1. Create a new Gradio Space
2. Connect this GitHub repository
3. Set the environment requirements (Python 3.10)
4. Deploy!
The app will automatically:
- Download the required models (Whisper, LLaMA-Omni, vocoder)
- Start the controller
- Start the model worker
- Launch the web interface
## πŸ–₯️ Local Usage
If you want to run the application locally without Docker:
```bash
python app.py
```
This will:
1. Start the controller
2. Start a model worker that loads LLaMA-Omni
3. Launch a web interface
You can then access the interface at: http://localhost:8000
## πŸ“ Example Usage
### Speech-to-Speech
1. Select the "Speech Input" tab
2. Record or upload audio
3. Click "Submit"
4. Receive both text and speech responses
### Text-to-Speech
1. Select the "Text Input" tab
2. Type your message
3. Click "Submit"
4. Receive both text and speech responses
## πŸ“š Development
To contribute to this project:
1. Fork the repository
2. Make your changes
3. Submit a pull request
## πŸ“„ LICENSE
This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may **NOT** be used for commercial purposes.
Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.