🦙🎧 LLaMA-Omni: Seamless Speech Interaction with Large Language Models

This is a Gradio deployment of LLaMA-Omni, a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.

💡 Highlights

💪 Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.
🚀 Low-latency speech interaction with a latency as low as 226ms.
🎧 Simultaneous generation of both text and speech responses.

📋 Prerequisites

Python 3.10+
PyTorch 2.0+
CUDA-compatible GPU (for optimal performance)

🛠️ Setup

Clone this repository:

git clone https://github.com/your-username/llama-omni.git
cd llama-omni

Create a virtual environment and install dependencies:

conda create -n llama-omni python=3.10
conda activate llama-omni
pip install -e .

Install fairseq:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install -e . --no-build-isolation

Install flash-attention:

pip install flash-attn --no-build-isolation

🚀 Deployment

This repository is configured for deployment on Gradio. The model weights and required components will be downloaded automatically during the first initialization.

Gradio Spaces Deployment

To deploy on Gradio Spaces:

Create a new Gradio Space
Connect this GitHub repository
Set the environment requirements (Python 3.10)
Deploy!

The app will automatically:

Download the required models (Whisper, LLaMA-Omni, vocoder)
Start the controller
Start the model worker
Launch the web interface

🖥️ Local Usage

If you want to run the application locally:

python app.py

This will:

Start the controller
Start a model worker that loads LLaMA-Omni
Launch a web interface

You can then access the interface at: http://localhost:8000

📝 Example Usage

Speech-to-Speech

Select the "Speech Input" tab
Record or upload audio
Click "Submit"
Receive both text and speech responses

Text-to-Speech

Select the "Text Input" tab
Type your message
Click "Submit"
Receive both text and speech responses

📚 Development

To contribute to this project:

Fork the repository
Make your changes
Submit a pull request

📄 LICENSE

This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may NOT be used for commercial purposes.

Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.