Spaces:
Build error
π¦π§ LLaMA-Omni: Seamless Speech Interaction with Large Language Models
This is a Gradio deployment of LLaMA-Omni, a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
π‘ Highlights
- πͺ Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.
- π Low-latency speech interaction with a latency as low as 226ms.
- π§ Simultaneous generation of both text and speech responses.
π Prerequisites
- Python 3.10+
- PyTorch 2.0+
- CUDA-compatible GPU (for optimal performance)
π οΈ Setup
Clone this repository:
git clone https://github.com/your-username/llama-omni.git cd llama-omni
Create a virtual environment and install dependencies:
conda create -n llama-omni python=3.10 conda activate llama-omni pip install -e .
Install fairseq:
git clone https://github.com/pytorch/fairseq cd fairseq pip install -e . --no-build-isolation
Install flash-attention:
pip install flash-attn --no-build-isolation
π Deployment
This repository is configured for deployment on Gradio. The model weights and required components will be downloaded automatically during the first initialization.
Gradio Spaces Deployment
To deploy on Gradio Spaces:
- Create a new Gradio Space
- Connect this GitHub repository
- Set the environment requirements (Python 3.10)
- Deploy!
The app will automatically:
- Download the required models (Whisper, LLaMA-Omni, vocoder)
- Start the controller
- Start the model worker
- Launch the web interface
π₯οΈ Local Usage
If you want to run the application locally:
python app.py
This will:
- Start the controller
- Start a model worker that loads LLaMA-Omni
- Launch a web interface
You can then access the interface at: http://localhost:8000
π Example Usage
Speech-to-Speech
- Select the "Speech Input" tab
- Record or upload audio
- Click "Submit"
- Receive both text and speech responses
Text-to-Speech
- Select the "Text Input" tab
- Type your message
- Click "Submit"
- Receive both text and speech responses
π Development
To contribute to this project:
- Fork the repository
- Make your changes
- Submit a pull request
π LICENSE
This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may NOT be used for commercial purposes.
Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.