Spaces:

marcosremar2
/

llama-omni

Build error

App Files Files Community

marcosremar2 commited on 21 days ago

Commit

34b8b49

1 Parent(s): c3907b6

dfdfdf

Browse files

Files changed (32) hide show

.cursor/rules/principal.mdc +0 -46
.gitignore +7 -5
.huggingface-space +0 -9
Dockerfile +0 -47
README.md +74 -130
SETUP_INSTRUCTIONS.md +67 -0
app.py +117 -362
app.yaml +0 -23
app_gradio_spaces.py +160 -0
audio_interface.py +0 -451
check_setup.py +120 -0
cog.yaml +27 -0
gradio_app.py +73 -0
launch_llama_omni2.py +0 -486
model_downloader.py +0 -219
no_download.py +0 -55
omni_speech/__init__.py +0 -0
omni_speech/infer/__init__.py +0 -0
omni_speech/infer/examples/example.json +15 -0
omni_speech/infer/inference.py +125 -0
omni_speech/infer/run.sh +32 -0
omni_speech/serve/__init__.py +0 -0
omni_speech/serve/controller.py +104 -0
omni_speech/serve/gradio_web_server.py +234 -0
omni_speech/serve/model_worker.py +211 -0
predict.py +86 -0
pyproject.toml +30 -0
requirements.txt +11 -13
run_without_downloads.sh +0 -55
tests/README.md +0 -116
tests/test.mp3 +0 -0
tests/test_llama_omni_api.py +0 -223

.cursor/rules/principal.mdc CHANGED Viewed

@@ -3,49 +3,3 @@ description:
 globs:
 alwaysApply: false
 ---
-. envia o mínimo possível de arquivos, na verdade tem que baixar mais os arquivos durante a inicialização
-# Resumo do Projeto LLaMA-Omni2 para Hugging Face Spaces
-Estou configurando uma aplicação de demonstração do LLaMA-Omni2, um assistente de linguagem e fala, para ser facilmente implantada no Hugging Face Spaces. Aqui está um resumo do que foi implementado:
-## Objetivo do Projeto
-Criar uma interface web interativa que demonstre as capacidades do LLaMA-Omni2, permitindo aos usuários interagir com o modelo através de texto e fala, recebendo respostas também nos dois formatos.
-## Componentes Principais
-1. **Interface Gradio**: Uma interface web amigável com duas abas:
-   - **Entrada de Áudio**: Permite aos usuários falar ou fazer upload de arquivos de áudio
-   - **Entrada de Texto**: Permite interações baseadas em texto
-2. **Pipeline de Reconhecimento de Fala**:
-   - Usa o modelo Whisper (tiny) para transcrever áudio para texto
-   - Configurado para carregar diretamente do Hugging Face
-3. **Geração de Texto e Fala**:
-   - Usa o modelo LLaMA-Omni2-0.5B para gerar respostas
-   - Suporta dois métodos de geração de fala: `generate_with_speech` e `generate_speech`
-   - Gerencia a conversão de respostas de texto para áudio
-4. **Otimizações para Hugging Face Spaces**:
-   - Carregamento dinâmico de modelos (não incluídos no repositório)
-   - Configuração para utilizar GPU quando disponível
-   - Sistema de logging abrangente para depuração
-5. **Gestão de Repositório**:
-   - Arquivo `.gitignore` configurado para excluir modelos grandes e artefatos desnecessários
-   - Remoção de arquivos grandes do histórico do git
-   - Estrutura de projeto limpa e organizada
-## Arquivos Principais
-- `app.py`: Contém a lógica principal da aplicação e a interface Gradio
-- `requirements.txt`: Lista todas as dependências necessárias
-- `.huggingface-space`: Configuração para o ambiente Hugging Face Spaces
-- `.gitignore`: Exclui arquivos grandes e temporários do controle de versão
-## Tecnologias Utilizadas
-- **Frameworks**: PyTorch, Transformers, Gradio
-- **Modelos**: LLaMA-Omni2-0.5B (para texto/fala), Whisper-tiny (para reconhecimento de fala)
-- **Infraestrutura**: Hugging Face Spaces para hospedagem
-O projeto está configurado para baixar os modelos dinamicamente quando implantado, em vez de incluí-los no repositório, resultando em um código limpo e eficiente que pode ser facilmente compartilhado e implantado.

 globs:
 alwaysApply: false
 ---

.gitignore CHANGED Viewed

@@ -20,12 +20,12 @@ var/
 .installed.cfg
 *.egg
-# Ambientes virtuais
 venv/
 ENV/
 .env/
-# Modelos e dados
 models/
 *.pt
 *.pth
@@ -40,7 +40,8 @@ models/
 whisper-large-v3/
 cosy2_decoder/
 speech_encoder/
-# Excluir todos os arquivos grandes de modelos de forma explícita
 flow.decoder.estimator.fp32.onnx
 flow.decoder.estimator.fp16.A10.plan
 flow.encoder.fp32.zip
@@ -58,8 +59,9 @@ model.safetensors.index.fp32.json
 .idea/
 *.swp
 *.swo
-# Sistema operacional
 .DS_Store
 Thumbs.db
@@ -67,7 +69,7 @@ Thumbs.db
 logs/
 *.log
-# Arquivos grandes
 *.dylib
 *.js.map
 *.so

 .installed.cfg
 *.egg
+# Environments
 venv/
 ENV/
 .env/
+# Model files and data
 models/
 *.pt
 *.pth
 whisper-large-v3/
 cosy2_decoder/
 speech_encoder/
+# Ignore all large model files
 flow.decoder.estimator.fp32.onnx
 flow.decoder.estimator.fp16.A10.plan
 flow.encoder.fp32.zip
 .idea/
 *.swp
 *.swo
+.cursor/
+# OS
 .DS_Store
 Thumbs.db
 logs/
 *.log
+# Large files
 *.dylib
 *.js.map
 *.so

.huggingface-space DELETED Viewed

@@ -1,9 +0,0 @@
-name: llama-omni
-sdk: gradio
-sdk_version: 5.29.0
-python_version: "3.10"
-gpu: true
-hardware: a100-sxm
-datasets:
-  - openai/whisper-tiny
-  - ICTNLP/LLaMA-Omni2-0.5B

Dockerfile DELETED Viewed

@@ -1,47 +0,0 @@
-FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
-WORKDIR /app
-# Instalar dependências do sistema
-RUN apt-get update && apt-get install -y \
-    git \
-    wget \
-    ffmpeg \
-    libsndfile1 \
-    build-essential \
-    ninja-build \
-    && rm -rf /var/lib/apt/lists/*
-# Copiar os arquivos de código
-COPY . .
-# Preparar diretório para modelos
-RUN mkdir -p models
-# Instalar requisitos Python
-RUN pip install --no-cache-dir -r requirements.txt
-# Instalar o LLaMA-Omni2 diretamente (se estiver presente)
-RUN if [ -d "./LLaMA-Omni2" ]; then \
-    cd LLaMA-Omni2 && \
-    pip install -e . \
-    ; fi
-# Instalar fairseq se necessário
-RUN pip install gitpython
-RUN if [ -d "./LLaMA-Omni2" ]; then \
-    pip install fairseq --no-build-isolation \
-    ; fi
-# Tentar instalar flash-attention (com tolerância a falhas)
-RUN pip install flash-attn --no-build-isolation || echo "Failed to install flash-attn, continuing without it"
-# Expor a porta para o Gradio
-EXPOSE 7860
-# Definir variáveis de ambiente
-ENV PYTHONUNBUFFERED=1
-ENV MODELS_DIR=/app/models
-# Comando para iniciar o servidor
-CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,162 +1,106 @@
----
-title: LLaMA-Omni Demo
-emoji: 🚀
-colorFrom: indigo
-colorTo: green
-sdk: gradio
-python_version: "3.10"
-sdk_version: "5.29.0"
-app_file: app.py
-pinned: false
-# Considere adicionar hardware se necessário (GPU)
-# Ex: hardware: nvidia-t4
----
-# LLaMA-Omni2 Interface
-Interface para o modelo LLaMA-Omni2, que permite entrada e saída de áudio com processamento de linguagem natural.
-## Características
-- Transcrição de áudio usando Whisper
-- Processamento de texto com LLaMA-Omni2
-- Síntese de fala usando CosyVoice 2
-- Geração de texto e fala em tempo real
-- Download automático de modelos durante a inicialização
-## Requisitos
-- Python 3.8+
-- PyTorch 2.0+
-- Transformers 4.36+
-- Gradio 3.50+
-- CUDA (opcional, mas recomendado para melhor desempenho)
-## Configuração de Modelos
-Este projeto utiliza um sistema de download automático de modelos durante a inicialização, evitando a necessidade de armazenar arquivos grandes no repositório Git.
-Os modelos serão baixados automaticamente na primeira execução:
-- **Whisper Large V3** - Modelo de reconhecimento de fala
-- **CosyVoice 2** - Vocoder para síntese de fala
-- **LLaMA-Omni2** - Modelo de linguagem multimodal
-Todos os modelos são armazenados na pasta `models/`, que está no `.gitignore` para evitar o commit de arquivos grandes.
-## Configuração
-1. Clone o repositório:
-```bash
-git clone https://github.com/seu-usuario/llama-omni2.git
-cd llama-omni2
-```
-2. Instale as dependências:
-```bash
-pip install -r requirements.txt
-```
-3. Execute o aplicativo:
-```bash
-python app.py
-```
-Na primeira execução, os modelos serão baixados automaticamente. Isso pode levar algum tempo, dependendo da sua conexão com a internet.
-## Uso
-Após iniciar o aplicativo, acesse a interface web em http://localhost:7860 para interagir com o modelo.
-- **Entrada de Áudio**: Grave ou faça upload de um arquivo de áudio
-- **Saída de Texto**: Veja a transcrição e a resposta do modelo
-- **Saída de Áudio**: Ouça a resposta sintetizada
-## Usando o launcher
-Você também pode usar o launcher para iniciar a aplicação completa:
-```bash
-python launch_llama_omni2.py
-```
-Opções do launcher:
-- `--skip-download`: Pula o download das dependências
-- `--extraction-dir`: Define o diretório de extração (padrão: extraction_dir)
-- `--models-dir`: Define o diretório de modelos (padrão: models)
-- `--controller-only`: Inicia apenas o controlador
-- `--worker-only`: Inicia apenas o worker do modelo
-- `--gradio-only`: Inicia apenas a interface Gradio
-## Estrutura do Projeto
-- `app.py` - Aplicativo Gradio principal
-- `audio_interface.py` - Interface de áudio para LLaMA-Omni2
-- `launch_llama_omni2.py` - Script para lançar todos os componentes
-- `model_downloader.py` - Sistema de download automático de modelos
-- `models/` - Diretório para armazenar os modelos baixados
-- `requirements.txt` - Dependências do projeto
-## Funcionamento do Download Automático
-O sistema de download automático funciona da seguinte forma:
-1. Na inicialização, o script verifica se os modelos necessários existem localmente
-2. Se um modelo não for encontrado, ele é baixado automaticamente do Hugging Face Hub
-3. Após o download, o modelo é carregado normalmente pelo aplicativo
-Isso permite:
-- Manter o repositório Git leve, sem arquivos grandes
-- Facilitar a implantação em diferentes ambientes
-- Garantir que os usuários sempre tenham os modelos corretos
-## Modo Sem Download
-Este projeto suporta um modo "sem download" que permite usar os modelos diretamente do Hugging Face Hub, sem baixá-los localmente. Isso é útil para:
-- Desenvolvimento e testes onde não é necessário baixar os modelos completos
-- Ambientes com espaço em disco limitado
-- Integração contínua e cenários de implantação onde os modelos são acessados remotamente
-Para ativar o modo sem download, você pode:
-1. **Usar o script Python no_download.py (recomendado)**:
-   ```bash
-   # Executar app.py sem download
-   python no_download.py app.py
-   # Executar outro script sem download
-   python no_download.py audio_interface.py
-   ```
-2. **Usar o script auxiliar**:
-   ```bash
-   ./run_without_downloads.sh
-   ```
-3. **Definir a variável de ambiente**:
-   ```bash
-   export NO_DOWNLOAD=1
-   python app.py
-   ```
-4. **Usar a opção de linha de comando no launcher**:
-   ```bash
-   python launch_llama_omni2.py --no-model-download
-   ```
-No modo sem download, o aplicativo usará os modelos diretamente do Hugging Face Hub, sem baixar arquivos localmente. Isso pode ser mais lento para uso contínuo, mas é mais rápido para inicializar e não ocupa espaço em disco.
-## Contribuição
-Contribuições são bem-vindas! Por favor, siga estas diretrizes:
-1. Faça um fork do repositório
-2. Crie um branch para sua feature (`git checkout -b feature/nova-feature`)
-3. Faça commit das suas mudanças (`git commit -am 'Adiciona nova feature'`)
-4. Faça push para o branch (`git push origin feature/nova-feature`)
-5. Crie um novo Pull Request
-## Licença
-Este projeto está licenciado sob os termos da licença MIT.

+# 🦙🎧 LLaMA-Omni: Seamless Speech Interaction with Large Language Models
+This is a Gradio deployment of [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni), a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
+## 💡 Highlights
+* 💪 **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
+* 🚀 **Low-latency speech interaction with a latency as low as 226ms.**
+* 🎧 **Simultaneous generation of both text and speech responses.**
+## 📋 Prerequisites
+- Python 3.10+
+- PyTorch 2.0+
+- CUDA-compatible GPU (for optimal performance)
+## 🛠️ Setup
+1. Clone this repository:
+   ```bash
+   git clone https://github.com/your-username/llama-omni.git
+   cd llama-omni
+   ```
+2. Create a virtual environment and install dependencies:
+   ```bash
+   conda create -n llama-omni python=3.10
+   conda activate llama-omni
+   pip install -e .
+   ```
+3. Install fairseq:
+   ```bash
+   git clone https://github.com/pytorch/fairseq
+   cd fairseq
+   pip install -e . --no-build-isolation
+   ```
+4. Install flash-attention:
+   ```bash
+   pip install flash-attn --no-build-isolation
+   ```
+## 🚀 Deployment
+This repository is configured for deployment on Gradio. The model weights and required components will be downloaded automatically during the first initialization.
+### Gradio Spaces Deployment
+To deploy on Gradio Spaces:
+1. Create a new Gradio Space
+2. Connect this GitHub repository
+3. Set the environment requirements (Python 3.10)
+4. Deploy!
+The app will automatically:
+- Download the required models (Whisper, LLaMA-Omni, vocoder)
+- Start the controller
+- Start the model worker
+- Launch the web interface
+## 🖥️ Local Usage
+If you want to run the application locally:
+```bash
+python app.py
+```
+This will:
+1. Start the controller
+2. Start a model worker that loads LLaMA-Omni
+3. Launch a web interface
+You can then access the interface at: http://localhost:8000
+## 📝 Example Usage
+### Speech-to-Speech
+1. Select the "Speech Input" tab
+2. Record or upload audio
+3. Click "Submit"
+4. Receive both text and speech responses
+### Text-to-Speech
+1. Select the "Text Input" tab
+2. Type your message
+3. Click "Submit"
+4. Receive both text and speech responses
+## 📚 Development
+To contribute to this project:
+1. Fork the repository
+2. Make your changes
+3. Submit a pull request
+## 📄 LICENSE
+This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may **NOT** be used for commercial purposes.
+Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.

SETUP_INSTRUCTIONS.md ADDED Viewed

	@@ -0,0 +1,67 @@

+# LLaMA-Omni Setup Instructions
+This repository contains the code structure for deploying LLaMA-Omni on Gradio. The actual model files will be downloaded automatically during deployment.
+## Repository Structure
+```
+llama-omni/
+├── app.py                      # Main application entry point
+├── app_gradio_spaces.py        # Entry point for Gradio Spaces
+├── check_setup.py              # Checks if the environment is properly set up
+├── cog.yaml                    # Configuration for Cog (container deployment)
+├── gradio_app.py               # Simplified Gradio app for testing
+├── predict.py                  # Predictor for Cog deployment
+├── pyproject.toml              # Project configuration
+├── requirements.txt            # Dependencies for pip
+├── README.md                   # Project documentation
+├── SETUP_INSTRUCTIONS.md       # This file
+└── omni_speech/                # Main package
+    ├── __init__.py
+    ├── infer/                  # Inference code
+    │   ├── __init__.py
+    │   ├── examples/           # Example inputs
+    │   │   └── example.json
+    │   ├── inference.py        # Inference logic
+    │   └── run.sh              # Script for running inference
+    └── serve/                  # Serving code
+        ├── __init__.py
+        ├── controller.py       # Controller for managing workers
+        ├── model_worker.py     # Worker for serving the model
+        └── gradio_web_server.py # Gradio web interface
+```
+## Deployment Options
+1. **Gradio Spaces**:
+   - Connect this repository to a Gradio Space
+   - The application will automatically download required models
+   - Use `app_gradio_spaces.py` as the entry point
+2. **Local Deployment**:
+   - Clone this repository
+   - Install dependencies: `pip install -r requirements.txt`
+   - Run the application: `python app.py`
+3. **Container Deployment with Cog**:
+   - Install Cog: `curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m``
+   - Build the container: `cog build`
+   - Run the container: `cog predict -i [email protected]`
+## Important Notes
+- The actual model files are not included in this repository
+- During deployment, the application will download:
+  - Whisper speech recognition model
+  - LLaMA-Omni model (simulated in this setup)
+  - HiFi-GAN vocoder
+## Testing the Setup
+Run the setup check script to verify your environment:
+```bash
+python check_setup.py
+```
+This will check for required directories, files, and Python packages.

app.py CHANGED Viewed

@@ -1,377 +1,132 @@
-import gradio as gr
-import torch
-from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
 import os
-import warnings
-import importlib
-import sys
 import subprocess
-import numpy as np
-import tempfile
-import soundfile as sf
-import logging
-import huggingface_hub
-from huggingface_hub import snapshot_download
-# Configurar logging
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-logger = logging.getLogger(__name__)
-# Verificar modo sem download (primeiro, antes de importar model_downloader)
-NO_DOWNLOAD = os.environ.get("NO_DOWNLOAD", "0").lower() in ("1", "true", "yes")
-logger.info(f"Inicializando app.py com NO_DOWNLOAD={NO_DOWNLOAD} (valor da env: {os.environ.get('NO_DOWNLOAD', 'não definido')})")
-# Import do novo model_downloader
-try:
-    from model_downloader import download_model_if_needed, download_all_models, get_model_repo_id, NO_DOWNLOAD as DOWNLOADER_NO_DOWNLOAD
-    # Verificar se os valores são consistentes
-    if NO_DOWNLOAD != DOWNLOADER_NO_DOWNLOAD:
-        logger.warning(f"Inconsistência detectada: NO_DOWNLOAD no app.py={NO_DOWNLOAD}, mas NO_DOWNLOAD no model_downloader.py={DOWNLOADER_NO_DOWNLOAD}")
-        # Atualizar para o valor no model_downloader.py
-        NO_DOWNLOAD = DOWNLOADER_NO_DOWNLOAD
-except ImportError:
-    logger.warning("model_downloader não pôde ser importado, trabalhando sem ele")
-    # Definir funções vazias para manter compatibilidade
-    def download_model_if_needed(model_key): return False
-    def download_all_models(): pass
-    def get_model_repo_id(model_key): return None
-# Configuração do caminho para os modelos
-MODELS_DIR = os.environ.get("MODELS_DIR", "models")
-os.makedirs(MODELS_DIR, exist_ok=True)
-# --- Model Configuration ---
-whisper_model_id = "openai/whisper-tiny"
-llama_omni_model_id = "ICTNLP/LLaMA-Omni2-0.5B"  # Modelo específico que queremos usar
-HF_TOKEN = os.environ.get("HF_TOKEN", None)  # Token para acessar modelos privados, se necessário
-# --- Device Configuration ---
-if torch.cuda.is_available():
-    device_for_pipelines = 0  # Use the first GPU for Hugging Face pipelines
-    torch_device = "cuda:0"    # PyTorch device string
-    dtype_for_pipelines = torch.float16
-else:
-    device_for_pipelines = -1  # Use CPU for Hugging Face pipelines
-    torch_device = "cpu"
-    dtype_for_pipelines = torch.float32
-logger.info(f"Using device: {torch_device} for model loading.")
-logger.info(f"Pipelines will use device_id: {device_for_pipelines} and dtype: {dtype_for_pipelines}")
-# --- Check Download Mode ---
-if NO_DOWNLOAD:
-    logger.warning("Modo NO_DOWNLOAD ativado. Os modelos não serão baixados, usando diretamente do Hugging Face Hub.")
-    # Usar IDs dos modelos diretamente do Hugging Face
-    whisper_repo_id = get_model_repo_id("speech_encoder") or "openai/whisper-large-v3"
-    llama_omni_repo_id = get_model_repo_id("llama_omni2") or llama_omni_model_id
-    # Definir caminhos para modelo
-    whisper_path_to_use = whisper_repo_id
-    model_path_to_use = llama_omni_repo_id
-    logger.info(f"Usando modelo whisper direto do HF: {whisper_path_to_use}")
-    logger.info(f"Usando modelo LLaMA-Omni2 direto do HF: {model_path_to_use}")
-else:
-    # --- Download Models if Needed ---
-    logger.info("Verificando se os modelos estão disponíveis localmente...")
-    # Download do modelo de speech recognition (Whisper)
-    download_model_if_needed("speech_encoder")
-    # Download do modelo de síntese de voz
-    download_model_if_needed("cosy2_decoder")
-    # Download do modelo LLaMA-Omni2
-    download_model_if_needed("llama_omni2")
-    # Configurar caminhos para modelos locais
-    whisper_local_path = os.path.join(MODELS_DIR, "speech_encoder", "whisper-large-v3")
-    whisper_path_to_use = whisper_local_path if os.path.exists(whisper_local_path) else whisper_model_id
-    local_model_path = os.path.join(MODELS_DIR, "LLaMA-Omni2-0.5B")
-    model_path_to_use = local_model_path if os.path.exists(local_model_path) and os.path.isdir(local_model_path) else llama_omni_model_id
-# --- Load Speech-to-Text (ASR) Pipeline ---
-asr_pipeline_instance = None
-try:
-    logger.info(f"Loading ASR model: {whisper_path_to_use}...")
-    asr_pipeline_instance = pipeline(
-        "automatic-speech-recognition",
-        model=whisper_path_to_use,
-        torch_dtype=dtype_for_pipelines,
-        device=device_for_pipelines
-    )
-    logger.info(f"ASR model loaded successfully.")
-except Exception as e:
-    logger.error(f"Error loading ASR model: {e}")
-    asr_pipeline_instance = None
-# --- Load Text Generation Model ---
-text_gen_pipeline_instance = None
-text_generation_model_id = None  # Will be set to the model that successfully loads
-try:
-    logger.info(f"Attempting to load LLaMA-Omni2 model: {model_path_to_use}...")
-    # LLaMA models often require specific loading configurations
-    tokenizer = AutoTokenizer.from_pretrained(
-        model_path_to_use,
-        trust_remote_code=True,
-        use_fast=False,
-        token=HF_TOKEN
-    )
-    model = AutoModelForCausalLM.from_pretrained(
-        model_path_to_use,
-        torch_dtype=dtype_for_pipelines,
-        trust_remote_code=True,
-        device_map="auto" if torch.cuda.is_available() else None,
-        low_cpu_mem_usage=True,
-        token=HF_TOKEN
-    )
-    # Check if this is a specialized Omni2 model with audio capabilities
-    is_omni2_speech_model = hasattr(model, "generate_with_speech") or hasattr(model, "generate_speech")
-    text_gen_pipeline_instance = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        torch_dtype=dtype_for_pipelines,
-        device=device_for_pipelines if not torch.cuda.is_available() else None
-    )
-    text_generation_model_id = llama_omni_model_id
-    logger.info(f"LLaMA-Omni2 model loaded successfully.")
-    logger.info(f"Model has speech generation capabilities: {is_omni2_speech_model}")
-except Exception as e:
-    logger.error(f"Error loading LLaMA-Omni2 model: {e}")
-    logger.error("Não foi possível carregar o modelo LLaMA-Omni2. Verifique se o modelo está disponível ou se há erro nas configurações.")
-    text_gen_pipeline_instance = None
-# --- Core Functions ---
-def transcribe_audio_input(audio_filepath):
-    if not asr_pipeline_instance:
-        return "ASR model not available. Please check startup logs.", ""
-    if audio_filepath is None:
-        return "No audio file provided for transcription.", ""
     try:
-        logger.info(f"Transcribing: {audio_filepath}")
-        result = asr_pipeline_instance(audio_filepath, chunk_length_s=30)
-        transcribed_text = result["text"]
-        logger.info(f"Transcription: '{transcribed_text}'")
-        return transcribed_text, transcribed_text
-    except Exception as e:
-        logger.error(f"Transcription error: {e}")
-        return f"Error during transcription: {str(e)}", ""
-def generate_text_response(prompt_text):
-    """Generate both text and speech response if possible"""
-    if not text_gen_pipeline_instance:
-        logger.error("Text generation model not available for response generation")
-        return f"Text generation model not available. Check logs.", None
-    if not prompt_text or not prompt_text.strip():
-        return "Prompt is empty. Please provide text for generation.", None
-    try:
-        logger.info(f"Generating response for prompt (first 100 chars): '{prompt_text[:100]}...'")
-        # Try to use special speech generation if available
-        model = text_gen_pipeline_instance.model
-        # Check if model has speech generation capability
-        if hasattr(model, "generate_with_speech") or hasattr(model, "generate_speech"):
-            try:
-                # Prepare inputs
-                inputs = text_gen_pipeline_instance.tokenizer(prompt_text, return_tensors="pt").to(model.device)
-                # Generate with speech
-                if hasattr(model, "generate_with_speech"):
-                    logger.info("Using generate_with_speech method")
-                    outputs = model.generate_with_speech(
-                        **inputs,
-                        max_new_tokens=150,
-                        do_sample=True,
-                        temperature=0.7,
-                        top_p=0.9
-                    )
-                    text_response = text_gen_pipeline_instance.tokenizer.decode(outputs["sequences"][0], skip_special_tokens=True)
-                    audio_data = outputs.get("speech_output", None)
-                elif hasattr(model, "generate_speech"):
-                    logger.info("Using generate_speech method")
-                    # Text generation first
-                    output_ids = model.generate(
-                        **inputs,
-                        max_new_tokens=150,
-                        do_sample=True,
-                        temperature=0.7,
-                        top_p=0.9
-                    )
-                    text_response = text_gen_pipeline_instance.tokenizer.decode(output_ids[0], skip_special_tokens=True)
-                    # Then speech generation
-                    audio_data = model.generate_speech(output_ids)
-                # Save audio if we got it
-                if audio_data is not None:
-                    audio_path = save_audio_to_temp_file(audio_data)
-                    return text_response, audio_path
-                else:
-                    logger.warning("No audio data was generated")
-                    return text_response, None
-            except Exception as speech_error:
-                logger.error(f"Error generating speech with LLaMA-Omni2: {speech_error}")
-                logger.info("Falling back to text-only generation")
-        # Parameters optimized for LLaMA-Omni2 text-only generation
-        logger.info("Using text-only generation")
-        generated_outputs = text_gen_pipeline_instance(
-            prompt_text,
-            max_new_tokens=150,
-            do_sample=True,
-            temperature=0.7,
-            top_p=0.9,
-            num_return_sequences=1
-        )
-        response_text = generated_outputs[0]["generated_text"]
-        logger.info(f"Generated text-only response with length: {len(response_text)}")
-        return response_text, None
-    except Exception as e:
-        logger.error(f"Text generation error: {e}")
-        return f"Error during text generation: {str(e)}", None
-def save_audio_to_temp_file(audio_data):
-    """Save audio data to a temporary file and return the path"""
-    try:
-        # Create a temporary file
-        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
-            temp_path = tmp_file.name
-        # Convert audio data to the right format if needed and save
-        if isinstance(audio_data, np.ndarray):
-            # Assuming sample rate of 16000 Hz, which is common for speech models
-            sf.write(temp_path, audio_data, 16000)
-        elif isinstance(audio_data, torch.Tensor):
-            # Convert tensor to numpy array
-            audio_np = audio_data.cpu().numpy()
-            sf.write(temp_path, audio_np, 16000)
-        else:
-            print(f"Unknown audio data type: {type(audio_data)}")
-            return None
-        print(f"Audio saved to temporary file: {temp_path}")
-        return temp_path
-    except Exception as e:
-        print(f"Error saving audio to file: {e}")
-        return None
-def combined_pipeline_process(audio_filepath):
-    if audio_filepath is None:
-        return "No audio input.", "No audio input.", None
-    transcribed_text, _ = transcribe_audio_input(audio_filepath)
-    if not asr_pipeline_instance or "Error during transcription" in transcribed_text or not transcribed_text.strip():
-        error_msg_for_generation = "Cannot generate response: Transcription failed or was empty."
-        if not asr_pipeline_instance:
-            error_msg_for_generation = "Cannot generate response: ASR model not loaded."
-        return transcribed_text, error_msg_for_generation, None
-    if not text_gen_pipeline_instance:
-        return transcribed_text, f"Cannot generate response: No text generation model available.", None
-    final_response, audio_path = generate_text_response(transcribed_text)
-    return transcribed_text, final_response, audio_path
-# Determine model status for UI
-if text_generation_model_id == llama_omni_model_id:
-    llama_model_status = "LLaMA-Omni2-0.5B loaded successfully"
-    using_model = "LLaMA-Omni2-0.5B"
-else:
-    llama_model_status = "Failed to load LLaMA-Omni2 model"
-    using_model = "No model available"
-# --- Gradio Interface Definition ---
-with gr.Blocks(theme=gr.themes.Soft(), title="Whisper + LLaMA-Omni2 Demo") as app_interface:
-    gr.Markdown(
-        f"""
-        # Speech-to-Text and Text/Speech Generation Demo
-        Esta aplicação usa **OpenAI Whisper Tiny** para reconhecimento de fala e **LLaMA-Omni2-0.5B** para geração de texto e fala.
-        **Modelo em uso:** {using_model}
-        Envie um arquivo de áudio para transcrevê-lo. O texto transcrito será então usado como prompt para o modelo de geração de texto/fala.
-        """
-    )
-    with gr.Tab("Pipeline Completo: Áudio -> Transcrição -> Geração"):
-        gr.Markdown("### Etapa 1: Envie Áudio -> Etapa 2: Transcrição -> Etapa 3: Geração de Texto/Fala")
-        input_audio_pipeline = gr.Audio(type="filepath", label="Envie seu arquivo de áudio (.wav, .mp3)")
-        submit_button_full = gr.Button("Executar Processo Completo", variant="primary")
-        output_transcription_pipeline = gr.Textbox(label="Texto Transcrito (do Whisper)", lines=5)
-        model_label = f"Texto Gerado (do {using_model})"
-        output_generation_pipeline = gr.Textbox(label=model_label, lines=7)
-        output_audio_pipeline = gr.Audio(label="Fala Gerada (se disponível)", visible=True)
-        submit_button_full.click(
-            fn=combined_pipeline_process,
-            inputs=[input_audio_pipeline],
-            outputs=[output_transcription_pipeline, output_generation_pipeline, output_audio_pipeline]
-        )
-    with gr.Tab("Testar Reconhecimento de Fala (Whisper Tiny)"):
-        gr.Markdown("### Transcreva áudio para texto usando Whisper Tiny.")
-        input_audio_asr = gr.Audio(type="filepath", label="Envie Áudio para Reconhecimento")
-        submit_button_asr = gr.Button("Transcrever Áudio", variant="secondary")
-        output_transcription_asr = gr.Textbox(label="Resultado da Transcrição", lines=10)
-        def asr_only_ui(audio_file):
-            if audio_file is None: return "Por favor, envie um arquivo de áudio."
-            transcription, _ = transcribe_audio_input(audio_file)
-            return transcription
-        submit_button_asr.click(
-            fn=asr_only_ui,
-            inputs=[input_audio_asr],
-            outputs=[output_transcription_asr]
-        )
-    with gr.Tab(f"Testar Geração de Texto/Fala"):
-        model_name_gen = using_model
-        gr.Markdown(f"### Gere texto e fala a partir de um prompt usando {model_name_gen}.")
-        input_text_prompt_gen = gr.Textbox(label="Seu Prompt de Texto", placeholder="Digite seu texto aqui...", lines=5)
-        submit_button_gen = gr.Button("Gerar Texto e Fala", variant="secondary")
-        output_generation_gen = gr.Textbox(label="Resultado do Texto Gerado", lines=10)
-        output_audio_gen = gr.Audio(label="Fala Gerada (se disponível)")
-        def text_generation_ui(prompt):
-            if not prompt or not prompt.strip():
-                return "Por favor, forneça um prompt primeiro.", None
-            response_text, audio_path = generate_text_response(prompt)
-            return response_text, audio_path
-        submit_button_gen.click(
-            fn=text_generation_ui,
-            inputs=[input_text_prompt_gen],
-            outputs=[output_generation_gen, output_audio_gen]
-        )
-    gr.Markdown("--- ")
-    gr.Markdown("### Status do Carregamento do Modelo (na inicialização do aplicativo):")
-    asr_load_status = "Carregado com sucesso" if asr_pipeline_instance else "Falha ao carregar (verifique os logs)"
-    gr.Markdown(f"*   **Modelo Whisper ({whisper_model_id}):** `{asr_load_status}`")
-    gr.Markdown(f"*   **Modelo LLaMA-Omni2 ({llama_omni_model_id}):** `{llama_model_status}`")
-# --- Launch the Gradio App ---
 if __name__ == "__main__":
-    print("Launching Gradio demo...")
-    try:
-        app_interface.launch(share=True, server_name="0.0.0.0")
-    except Exception as e:
-        print(f"Error launching with share=True: {e}")
-        print("Trying to launch without sharing...")
-        app_interface.launch(server_name="0.0.0.0")

 import os
 import subprocess
+import threading
+import time
+import gradio as gr
+import whisper
+import requests
+# Configuration
+MODEL_NAME = "Llama-3.1-8B-Omni"
+CONTROLLER_PORT = 10000
+WEB_SERVER_PORT = 8000
+MODEL_WORKER_PORT = 40000
+# Paths
+VOCODER_PATH = "vocoder/g_00500000"
+VOCODER_CFG = "vocoder/config.json"
+def download_models():
+    """Ensure that required models are available"""
+    os.makedirs("models/speech_encoder", exist_ok=True)
+    # Download Whisper model if needed (this will happen during deployment)
+    print("Setting up Whisper model...")
+    whisper.load_model("large-v3", download_root="models/speech_encoder/")
+    # Download vocoder if needed
+    if not os.path.exists(VOCODER_PATH):
+        print("Downloading vocoder...")
+        os.makedirs("vocoder", exist_ok=True)
+        subprocess.run([
+            "wget", "https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000",
+            "-P", "vocoder/"
+        ])
+        subprocess.run([
+            "wget", "https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json",
+            "-P", "vocoder/"
+        ])
+def start_controller():
+    """Start the controller process"""
+    print("Starting controller...")
+    controller_process = subprocess.Popen([
+        "python", "-m", "omni_speech.serve.controller",
+        "--host", "0.0.0.0",
+        "--port", str(CONTROLLER_PORT)
+    ])
+    time.sleep(5)  # Wait for controller to start
+    return controller_process
+def start_model_worker():
+    """Start the model worker process"""
+    print("Starting model worker...")
+    worker_process = subprocess.Popen([
+        "python", "-m", "omni_speech.serve.model_worker",
+        "--host", "0.0.0.0",
+        "--controller", f"http://localhost:{CONTROLLER_PORT}",
+        "--port", str(MODEL_WORKER_PORT),
+        "--worker", f"http://localhost:{MODEL_WORKER_PORT}",
+        "--model-path", MODEL_NAME,
+        "--model-name", MODEL_NAME,
+        "--s2s"
+    ])
+    time.sleep(10)  # Wait for model worker to start
+    return worker_process
+def start_web_server():
+    """Start the web server process"""
+    print("Starting web server...")
+    web_process = subprocess.Popen([
+        "python", "-m", "omni_speech.serve.gradio_web_server",
+        "--controller", f"http://localhost:{CONTROLLER_PORT}",
+        "--port", str(WEB_SERVER_PORT),
+        "--model-list-mode", "reload",
+        "--vocoder", VOCODER_PATH,
+        "--vocoder-cfg", VOCODER_CFG
+    ])
+    return web_process
+def check_services():
+    """Check if all services are running"""
     try:
+        controller_resp = requests.get(f"http://localhost:{CONTROLLER_PORT}/status").json()
+        web_server_resp = requests.get(f"http://localhost:{WEB_SERVER_PORT}/").status_code
+        return controller_resp["status"] == "ok" and web_server_resp == 200
+    except Exception:
+        return False
+def main():
+    # Download required models
+    download_models()
+    # Start all services
+    controller = start_controller()
+    worker = start_model_worker()
+    web_server = start_web_server()
+    # Create a simple redirection interface
+    with gr.Blocks() as demo:
+        gr.Markdown("# 🦙🎧 LLaMA-Omni")
+        gr.Markdown("## Starting LLaMA-Omni services...")
+        with gr.Row():
+            status = gr.Textbox(value="Initializing...", label="Status")
+        with gr.Row():
+            redirect_btn = gr.Button("Go to LLaMA-Omni Interface")
+        def update_status():
+            if check_services():
+                return "All services running! Click the button below to access the interface."
+            else:
+                return "Still starting services... Please wait."
+        def redirect():
+            return gr.Redirect(f"http://localhost:{WEB_SERVER_PORT}")
+        # Update status every 5 seconds
+        demo.load(update_status, outputs=status, every=5)
+        redirect_btn.click(redirect)
+    # Launch the Gradio interface
+    try:
+        demo.launch(server_name="0.0.0.0")
+    finally:
+        # Clean up processes when Gradio is closed
+        controller.terminate()
+        worker.terminate()
+        web_server.terminate()
 if __name__ == "__main__":
+    main()

app.yaml DELETED Viewed

@@ -1,23 +0,0 @@
-sdk: docker
-build_config:
-  gpu: true
-  cuda: "11.8"
-  python_version: "3.10"
-  system_packages:
-    - "ffmpeg"
-    - "libsndfile1"
-    - "build-essential"
-    - "ninja-build"
-    - "git"
-resources:
-  gpu: A10G
-  cpu: 4
-  memory: "30G"
-  disk: "10G"
-models:
-  - "openai/whisper-tiny"
-  - "ICTNLP/LLaMA-Omni2-0.5B"
-secrets:
-  - name: HF_TOKEN
-    help: "Token de autenticação do Hugging Face (opcional)"
-    required: false

app_gradio_spaces.py ADDED Viewed

	@@ -0,0 +1,160 @@

+import os
+import sys
+import subprocess
+import threading
+import time
+import gradio as gr
+def run_background_process(cmd, name):
+    """Run a background process and return the process object."""
+    print(f"Starting {name}...")
+    process = subprocess.Popen(
+        cmd,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        text=True,
+        bufsize=1,
+        universal_newlines=True,
+        shell=True
+    )
+    return process
+def read_process_output(process, output_box, name):
+    """Read and update the output from a process."""
+    full_output = f"### {name} Output:\n\n"
+    for line in process.stdout:
+        full_output += line
+        output_box.update(value=full_output)
+    # Process ended
+    return_code = process.wait()
+    full_output += f"\n\nProcess exited with code {return_code}"
+    output_box.update(value=full_output)
+def setup_environment():
+    """Set up the environment by installing dependencies and downloading models."""
+    # Create necessary directories
+    os.makedirs("models/speech_encoder", exist_ok=True)
+    os.makedirs("vocoder", exist_ok=True)
+    # Download whisper model
+    os.system("pip install openai-whisper>=20231117")
+    os.system("pip install fairseq==0.12.2")
+    # Download vocoder
+    if not os.path.exists("vocoder/g_00500000"):
+        os.system("wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000 -P vocoder/")
+        os.system("wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json -P vocoder/")
+    # Initialize Whisper (it will be downloaded automatically)
+    os.system("python -c \"import whisper; whisper.load_model('large-v3', download_root='models/speech_encoder/')\"")
+    return "✅ Environment setup complete!"
+def start_services(controller_output, model_worker_output, web_server_output):
+    """Start the controller, model worker, and web server."""
+    # Start the controller
+    controller_process = run_background_process(
+        "python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000",
+        "Controller"
+    )
+    # Start a thread to read controller output
+    controller_thread = threading.Thread(
+        target=read_process_output,
+        args=(controller_process, controller_output, "Controller"),
+        daemon=True
+    )
+    controller_thread.start()
+    # Wait for controller to start
+    time.sleep(5)
+    # Start the model worker
+    model_worker_process = run_background_process(
+        "python -m omni_speech.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path Llama-3.1-8B-Omni --model-name Llama-3.1-8B-Omni --s2s",
+        "Model Worker"
+    )
+    # Start a thread to read model worker output
+    model_worker_thread = threading.Thread(
+        target=read_process_output,
+        args=(model_worker_process, model_worker_output, "Model Worker"),
+        daemon=True
+    )
+    model_worker_thread.start()
+    # Wait for model worker to start
+    time.sleep(10)
+    # Start the web server
+    web_server_process = run_background_process(
+        "python -m omni_speech.serve.gradio_web_server --controller http://localhost:10000 --port 8001 --model-list-mode reload --vocoder vocoder/g_00500000 --vocoder-cfg vocoder/config.json",
+        "Web Server"
+    )
+    # Start a thread to read web server output
+    web_server_thread = threading.Thread(
+        target=read_process_output,
+        args=(web_server_process, web_server_output, "Web Server"),
+        daemon=True
+    )
+    web_server_thread.start()
+    # Wait for web server to start
+    time.sleep(5)
+    return "✅ All services started! Click the 'Open Interface' button below."
+def build_ui():
+    """Build the Gradio UI."""
+    with gr.Blocks() as demo:
+        gr.Markdown("# 🦙🎧 LLaMA-Omni Deployment")
+        with gr.Tab("Setup"):
+            setup_btn = gr.Button("Setup Environment")
+            setup_output = gr.Textbox(label="Setup Output", value="Click 'Setup Environment' to start.")
+            setup_btn.click(setup_environment, outputs=setup_output)
+        with gr.Tab("Services"):
+            start_btn = gr.Button("Start LLaMA-Omni Services")
+            status_output = gr.Textbox(label="Status", value="Click 'Start LLaMA-Omni Services' to begin.")
+            with gr.Accordion("Service Logs", open=False):
+                controller_output = gr.Markdown("Controller not started")
+                model_worker_output = gr.Markdown("Model Worker not started")
+                web_server_output = gr.Markdown("Web Server not started")
+            start_btn.click(
+                start_services,
+                inputs=[],
+                outputs=[status_output, controller_output, model_worker_output, web_server_output]
+            )
+            interface_btn = gr.Button("Open Interface")
+            interface_btn.click(lambda: gr.update(value="http://localhost:8001"), None, None)
+        with gr.Tab("About"):
+            gr.Markdown("""
+            # About LLaMA-Omni
+            LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
+            ## Features
+            * Built on Llama-3.1-8B-Instruct, ensuring high-quality responses
+            * Low-latency speech interaction with a latency as low as 226ms
+            * Simultaneous generation of both text and speech responses
+            ## License
+            This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may NOT be used for commercial purposes.
+            Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.
+            """)
+    return demo
+if __name__ == "__main__":
+    demo = build_ui()
+    demo.launch(server_port=7860)

audio_interface.py DELETED Viewed

@@ -1,451 +0,0 @@
-#!/usr/bin/env python3
-"""
-Audio interface for LLaMA-Omni2 that accepts audio input and returns audio output.
-This interface:
-1. Transcribes audio input using Whisper
-2. Processes the transcription with LLaMA-Omni2 model
-3. Synthesizes the response back to audio using CosyVoice 2
-Enhanced with streaming generation and read-write scheduling for real-time response.
-"""
-import os
-import sys
-import argparse
-import logging
-import time
-import asyncio
-import tempfile
-from pathlib import Path
-from queue import Queue
-from threading import Thread
-import json
-import torch
-import torchaudio
-import gradio as gr
-import whisper
-import aiohttp
-import numpy as np
-# Import model downloader
-try:
-    from model_downloader import download_model_if_needed, download_all_models, get_model_repo_id, NO_DOWNLOAD
-    has_model_downloader = True
-except ImportError:
-    has_model_downloader = False
-    NO_DOWNLOAD = False
-# Configure logging
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-class AudioInterface:
-    def __init__(
-        self,
-        controller_url: str,
-        whisper_model_path: str,
-        vocoder_dir: str,
-        model_name: str = "LLaMA-Omni2-7B-Bilingual",
-        read_tokens: int = 3,
-        write_tokens: int = 10
-    ):
-        self.controller_url = controller_url
-        self.whisper_model_path = whisper_model_path
-        self.vocoder_dir = vocoder_dir
-        self.model_name = model_name
-        self.device = "cuda" if torch.cuda.is_available() else "cpu"
-        # Read-write scheduling parameters for streaming generation
-        self.read_tokens = read_tokens  # Number of text tokens to read
-        self.write_tokens = write_tokens  # Number of speech tokens to write
-        # Download required models if needed
-        self._ensure_models_available()
-        # Load Whisper model
-        try:
-            # Se NO_DOWNLOAD estiver ativado, usar diretamente o modelo do Hugging Face
-            if has_model_downloader and NO_DOWNLOAD:
-                whisper_model_path = "openai/whisper-large-v3"
-                logger.info(f"Modo NO_DOWNLOAD: Carregando Whisper direto do Hugging Face: {whisper_model_path}")
-            logger.info(f"Loading Whisper model from {whisper_model_path}")
-            self.whisper_model = whisper.load_model("large-v3",
-                                                   download_root=whisper_model_path if not NO_DOWNLOAD else None,
-                                                   device=self.device)
-            logger.info("Whisper model loaded successfully")
-        except Exception as e:
-            logger.error(f"Failed to load Whisper model: {e}")
-            self.whisper_model = None
-        # Load CosyVoice vocoder
-        try:
-            # Se NO_DOWNLOAD estiver ativado, usar diretamente o modelo do Hugging Face
-            if has_model_downloader and NO_DOWNLOAD:
-                logger.warning("Modo NO_DOWNLOAD ativado. O vocoder CosyVoice pode não funcionar corretamente sem os arquivos locais.")
-            sys.path.insert(0, vocoder_dir)
-            from cosy_voice_2.inference import CosyVoice
-            self.vocoder = CosyVoice(
-                device=self.device,
-                model_path=vocoder_dir
-            )
-            logger.info(f"CosyVoice vocoder loaded from {vocoder_dir}")
-        except Exception as e:
-            logger.error(f"Failed to load CosyVoice vocoder: {e}")
-            self.vocoder = None
-        logger.info(f"Using LLaMA-Omni2 model: {model_name}")
-    def _ensure_models_available(self):
-        """Garante que os modelos necessários estão disponíveis"""
-        # Verificar se temos o model_downloader disponível
-        if has_model_downloader:
-            if NO_DOWNLOAD:
-                logger.info("Modo NO_DOWNLOAD ativado. Pulando verificação de modelos locais.")
-                return
-            logger.info("Verificando modelos necessários...")
-            # Baixar modelo Whisper
-            download_model_if_needed("speech_encoder")
-            # Baixar modelo CosyVoice
-            download_model_if_needed("cosy2_decoder")
-            logger.info("Verificação de modelos concluída")
-        else:
-            logger.warning("model_downloader não está disponível. Assumindo que os modelos já estão disponíveis localmente.")
-    async def get_worker_address(self):
-        """Get the address of the worker serving the model"""
-        try:
-            async with aiohttp.ClientSession() as session:
-                async with session.get(
-                    f"{self.controller_url}/get_worker_address?model_name={self.model_name}",
-                    timeout=30
-                ) as response:
-                    if response.status == 200:
-                        data = await response.json()
-                        return data.get("address")
-                    else:
-                        logger.error(f"Failed to get worker address: {await response.text()}")
-                        return None
-        except Exception as e:
-            logger.error(f"Error getting worker address: {e}")
-            return None
-    async def generate_text(self, prompt: str, streaming=False):
-        """Generate text from LLaMA-Omni2 model"""
-        worker_addr = await self.get_worker_address()
-        if not worker_addr:
-            return f"Error: No worker available for model {self.model_name}"
-        try:
-            async with aiohttp.ClientSession() as session:
-                # For streaming generation
-                if streaming:
-                    async with session.post(
-                        f"{worker_addr}/generate_stream",
-                        json={"prompt": prompt},
-                        timeout=120
-                    ) as response:
-                        if response.status == 200:
-                            response_text = ""
-                            async for line in response.content:
-                                if line:
-                                    data = json.loads(line)
-                                    chunk = data.get("text", "")
-                                    response_text += chunk
-                                    yield response_text
-                            return response_text
-                        else:
-                            error_text = await response.text()
-                            logger.error(f"Failed to generate text stream: {error_text}")
-                            return f"Error: {error_text}"
-                # For non-streaming generation
-                else:
-                    async with session.post(
-                        f"{worker_addr}/generate",
-                        json={"prompt": prompt},
-                        timeout=120
-                    ) as response:
-                        if response.status == 200:
-                            data = await response.json()
-                            return data.get("response", "No response received from model")
-                        else:
-                            error_text = await response.text()
-                            logger.error(f"Failed to generate text: {error_text}")
-                            return f"Error: {error_text}"
-        except Exception as e:
-            logger.error(f"Error generating text: {e}")
-            return f"Error: {str(e)}"
-    def transcribe_audio(self, audio_path):
-        """Transcribe audio using Whisper"""
-        if self.whisper_model is None:
-            return "Error: Whisper model not loaded"
-        try:
-            logger.info(f"Transcribing audio from {audio_path}")
-            result = self.whisper_model.transcribe(audio_path)
-            logger.info("Transcription completed")
-            return result["text"]
-        except Exception as e:
-            logger.error(f"Error transcribing audio: {e}")
-            return f"Error transcribing audio: {str(e)}"
-    def synthesize_speech(self, text):
-        """Synthesize speech from text using CosyVoice"""
-        if self.vocoder is None:
-            return None, 16000, "Error: Vocoder not loaded"
-        try:
-            logger.info("Synthesizing speech from text response")
-            # Generate speech using CosyVoice
-            waveform = self.vocoder.inference(text)
-            sample_rate = self.vocoder.sample_rate
-            # Convert to numpy array for Gradio
-            if isinstance(waveform, torch.Tensor):
-                waveform = waveform.cpu().numpy()
-            logger.info("Speech synthesis completed")
-            return waveform, sample_rate, None
-        except Exception as e:
-            logger.error(f"Error synthesizing speech: {e}")
-            return None, 16000, f"Error synthesizing speech: {str(e)}"
-    async def synthesize_speech_chunk(self, text_chunk):
-        """Synthesize speech for a single text chunk"""
-        if self.vocoder is None:
-            return None, 16000, "Error: Vocoder not loaded"
-        try:
-            # Generate speech using CosyVoice for this chunk
-            waveform = self.vocoder.inference(text_chunk)
-            sample_rate = self.vocoder.sample_rate
-            # Convert to numpy array
-            if isinstance(waveform, torch.Tensor):
-                waveform = waveform.cpu().numpy()
-            return waveform, sample_rate, None
-        except Exception as e:
-            logger.error(f"Error synthesizing speech chunk: {e}")
-            return None, 16000, f"Error synthesizing speech chunk: {str(e)}"
-    async def stream_text_to_speech(self, text_generator):
-        """Stream text to speech using read-write scheduling"""
-        buffer = ""
-        audio_chunks = []
-        try:
-            async for text in text_generator:
-                # Accumulate text until we have enough to synthesize
-                buffer += text
-                # When we have enough tokens for synthesis (approximate by characters)
-                if len(buffer.split()) >= self.read_tokens:
-                    # Process the buffer
-                    chunk_to_process = buffer
-                    buffer = ""
-                    # Synthesize this chunk
-                    audio_chunk, sample_rate, error = await self.synthesize_speech_chunk(chunk_to_process)
-                    if error:
-                        logger.error(f"Error in streaming synthesis: {error}")
-                        continue
-                    # Add to our collection of audio chunks
-                    audio_chunks.append(audio_chunk)
-                    # Yield the current concatenated audio
-                    if audio_chunks:
-                        # Concatenate audio chunks
-                        full_audio = np.concatenate(audio_chunks)
-                        yield full_audio, sample_rate, chunk_to_process
-            # Process any remaining text in the buffer
-            if buffer:
-                audio_chunk, sample_rate, error = await self.synthesize_speech_chunk(buffer)
-                if not error and audio_chunk is not None:
-                    audio_chunks.append(audio_chunk)
-            # Final audio output
-            if audio_chunks:
-                full_audio = np.concatenate(audio_chunks)
-                return full_audio, sample_rate, None
-            else:
-                return None, 16000, "No audio generated"
-        except Exception as e:
-            logger.error(f"Error in streaming text to speech: {e}")
-            return None, 16000, f"Error in streaming text to speech: {str(e)}"
-    async def process_audio(self, audio_data, sample_rate, streaming=False):
-        """Process audio input and return audio output"""
-        # Save the input audio to a temporary file
-        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio:
-            temp_path = temp_audio.name
-            # Convert sample rate if needed
-            if sample_rate != 16000:
-                resampler = torchaudio.transforms.Resample(
-                    orig_freq=sample_rate, new_freq=16000
-                )
-                audio_tensor = torch.tensor(audio_data).unsqueeze(0)
-                audio_tensor = resampler(audio_tensor)
-                audio_data = audio_tensor.squeeze(0).numpy()
-                sample_rate = 16000
-            # Save as WAV
-            torchaudio.save(temp_path, torch.tensor(audio_data).unsqueeze(0), sample_rate)
-        try:
-            # Step 1: Transcribe audio
-            transcription = self.transcribe_audio(temp_path)
-            if transcription.startswith("Error"):
-                return None, sample_rate, transcription, "Error occurred during transcription", transcription
-            # Step 2: Process with LLaMA-Omni2
-            if streaming:
-                # For streaming mode, we use a generator
-                text_generator = self.generate_text(transcription, streaming=True)
-                audio_generator = self.stream_text_to_speech(text_generator)
-                return audio_generator, transcription
-            else:
-                # For non-streaming mode
-                response_text = await self.generate_text(transcription)
-                if response_text.startswith("Error"):
-                    return None, sample_rate, transcription, response_text, response_text
-                # Step 3: Synthesize speech
-                audio_output, out_sample_rate, error = self.synthesize_speech(response_text)
-                if error:
-                    return None, sample_rate, transcription, response_text, error
-                return audio_output, out_sample_rate, transcription, response_text, None
-        finally:
-            # Clean up temporary file
-            if os.path.exists(temp_path):
-                os.unlink(temp_path)
-    def build_interface(self):
-        """Build Gradio interface"""
-        with gr.Blocks(title="LLaMA-Omni2 Audio Interface") as demo:
-            gr.Markdown("# LLaMA-Omni2 Audio Interface")
-            gr.Markdown("Speak to LLaMA-Omni2 and hear its response in real-time")
-            with gr.Row():
-                with gr.Column():
-                    audio_input = gr.Audio(
-                        sources=["microphone", "upload"],
-                        type="numpy",
-                        label="Input Audio"
-                    )
-                    with gr.Row():
-                        submit_button = gr.Button("Process Audio", variant="primary")
-                        stream_button = gr.Button("Stream Audio Response", variant="secondary")
-                with gr.Column():
-                    transcription = gr.Textbox(
-                        label="Transcription",
-                        interactive=False
-                    )
-                    response_text = gr.Textbox(
-                        label="Response Text",
-                        interactive=False
-                    )
-                    audio_output = gr.Audio(
-                        label="Response Audio",
-                        type="numpy",
-                        interactive=False
-                    )
-                    error_text = gr.Textbox(
-                        label="Errors (if any)",
-                        interactive=False,
-                        visible=False
-                    )
-            async def process_wrapper(audio_data):
-                if audio_data is None:
-                    return None, "No audio input detected", "Please record or upload audio", "No audio input detected"
-                audio_array, sample_rate = audio_data
-                output, out_sample_rate, trans, resp, error = await self.process_audio(audio_array, sample_rate, streaming=False)
-                if error:
-                    gr.update(visible=True)
-                    return None, trans, resp, error
-                return (output, out_sample_rate), trans, resp, ""
-            async def stream_wrapper(audio_data):
-                if audio_data is None:
-                    return None, "No audio input detected", "Please record or upload audio", "No audio input detected"
-                audio_array, sample_rate = audio_data
-                generator, transcription = await self.process_audio(audio_array, sample_rate, streaming=True)
-                # Update transcription immediately
-                yield None, transcription, "", ""
-                # Start streaming
-                current_text = ""
-                async for audio_chunk, sr, text_chunk in generator:
-                    current_text += text_chunk
-                    yield (audio_chunk, sr), transcription, current_text, ""
-            submit_button.click(
-                fn=lambda audio: asyncio.create_task(process_wrapper(audio)),
-                inputs=[audio_input],
-                outputs=[audio_output, transcription, response_text, error_text]
-            )
-            stream_button.click(
-                fn=lambda audio: stream_wrapper(audio),
-                inputs=[audio_input],
-                outputs=[audio_output, transcription, response_text, error_text]
-            )
-        return demo
-def main():
-    parser = argparse.ArgumentParser(description="Audio interface for LLaMA-Omni2")
-    parser.add_argument("--host", type=str, default="0.0.0.0")
-    parser.add_argument("--port", type=int, default=7860)
-    parser.add_argument("--controller-url", type=str, default="http://localhost:10000")
-    parser.add_argument("--whisper-model-path", type=str, default="models/speech_encoder")
-    parser.add_argument("--vocoder-dir", type=str, default="models/cosy2_decoder")
-    parser.add_argument("--model-name", type=str, default="LLaMA-Omni2-7B-Bilingual")
-    parser.add_argument("--read-tokens", type=int, default=3,
-                        help="Number of text tokens to read before generating speech")
-    parser.add_argument("--write-tokens", type=int, default=10,
-                        help="Number of speech tokens to write for each read")
-    parser.add_argument("--share", action="store_true", help="Create a public link")
-    args = parser.parse_args()
-    # Create the interface
-    interface = AudioInterface(
-        controller_url=args.controller_url,
-        whisper_model_path=args.whisper_model_path,
-        vocoder_dir=args.vocoder_dir,
-        model_name=args.model_name,
-        read_tokens=args.read_tokens,
-        write_tokens=args.write_tokens
-    )
-    # Build and launch the interface
-    demo = interface.build_interface()
-    demo.queue()
-    demo.launch(
-        server_name=args.host,
-        server_port=args.port,
-        share=args.share
-    )
-if __name__ == "__main__":
-    main()

check_setup.py ADDED Viewed

	@@ -0,0 +1,120 @@

+import os
+import sys
+import importlib.util
+import subprocess
+# Define required directories
+required_dirs = [
+    "omni_speech",
+    "omni_speech/serve",
+    "omni_speech/infer",
+    "vocoder"
+]
+# Define required files
+required_files = [
+    "app.py",
+    "omni_speech/__init__.py",
+    "omni_speech/serve/__init__.py",
+    "omni_speech/serve/controller.py",
+    "omni_speech/serve/model_worker.py",
+    "omni_speech/serve/gradio_web_server.py",
+    "omni_speech/infer/__init__.py",
+    "omni_speech/infer/inference.py",
+    "omni_speech/infer/run.sh"
+]
+# Define required packages
+required_packages = [
+    "torch",
+    "transformers",
+    "gradio",
+    "fastapi",
+    "uvicorn",
+    "pydantic",
+    "numpy",
+    "tqdm"
+]
+def check_directory_structure():
+    """Check if all required directories exist."""
+    print("Checking directory structure...")
+    missing_dirs = []
+    for dir_path in required_dirs:
+        if not os.path.isdir(dir_path):
+            missing_dirs.append(dir_path)
+    if missing_dirs:
+        print(f"❌ Missing directories: {', '.join(missing_dirs)}")
+        return False
+    else:
+        print("✅ All required directories exist.")
+        return True
+def check_required_files():
+    """Check if all required files exist."""
+    print("Checking required files...")
+    missing_files = []
+    for file_path in required_files:
+        if not os.path.isfile(file_path):
+            missing_files.append(file_path)
+    if missing_files:
+        print(f"❌ Missing files: {', '.join(missing_files)}")
+        return False
+    else:
+        print("✅ All required files exist.")
+        return True
+def check_packages():
+    """Check if all required packages are installed."""
+    print("Checking required packages...")
+    missing_packages = []
+    for package in required_packages:
+        if importlib.util.find_spec(package) is None:
+            missing_packages.append(package)
+    if missing_packages:
+        print(f"❌ Missing packages: {', '.join(missing_packages)}")
+        return False
+    else:
+        print("✅ All required packages are installed.")
+        return True
+def check_python_version():
+    """Check if Python version is compatible."""
+    print("Checking Python version...")
+    major, minor = sys.version_info[:2]
+    if major != 3 or minor < 10:
+        print(f"❌ Incompatible Python version: {major}.{minor}. Python 3.10+ is required.")
+        return False
+    else:
+        print(f"✅ Python version is compatible: {major}.{minor}")
+        return True
+def main():
+    """Run all checks."""
+    print("🔍 Checking LLaMA-Omni setup...")
+    print("-" * 50)
+    checks = [
+        check_directory_structure(),
+        check_required_files(),
+        check_packages(),
+        check_python_version()
+    ]
+    print("-" * 50)
+    if all(checks):
+        print("✅ All checks passed! LLaMA-Omni is set up correctly.")
+        print("🚀 Run 'python app.py' to start the application.")
+    else:
+        print("❌ Some checks failed. Please fix the issues before running the application.")
+if __name__ == "__main__":
+    main()

cog.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+build:
+  gpu: true
+  python_version: "3.10"
+  python_packages:
+    - "torch==2.0.1"
+    - "transformers==4.34.0"
+    - "accelerate==0.21.0"
+    - "gradio==3.50.2"
+    - "fastapi==0.104.0"
+    - "uvicorn==0.23.2"
+    - "pydantic==2.3.0"
+    - "openai-whisper==20231117"
+    - "numpy==1.24.0"
+    - "tqdm==4.66.1"
+    - "flash-attn==2.3.0"
+    - "requests==2.31.0"
+  system_packages:
+    - "wget"
+    - "ffmpeg"
+    - "libsndfile1"
+  run:
+    - "pip install -e git+https://github.com/pytorch/fairseq.git#egg=fairseq"
+    - "mkdir -p vocoder"
+    - "wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000 -P vocoder/"
+    - "wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json -P vocoder/"
+predict: "predict.py:Predictor"

gradio_app.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import gradio as gr
+import subprocess
+import threading
+import time
+import os
+def check_dependencies():
+    """Check and install missing dependencies."""
+    print("Checking and installing dependencies...")
+    # Create necessary directories
+    os.makedirs("models/speech_encoder", exist_ok=True)
+    os.makedirs("vocoder", exist_ok=True)
+    # Download vocoder if needed (this will be done on deployment)
+    if not os.path.exists("vocoder/g_00500000"):
+        print("Vocoder will be downloaded when deployed")
+    # Return success message
+    return "✅ Setup ready for deployment!"
+def launch_services():
+    """Prepare to launch all services."""
+    return """
+    # LLaMA-Omni Services
+    When deployed to Gradio Spaces, this app will:
+    1. Download required models (Whisper, LLaMA-Omni, vocoder)
+    2. Start the controller
+    3. Start the model worker
+    4. Launch the web interface
+    ## Notes
+    - The model will be loaded automatically during deployment
+    - Audio can be processed via both speech input and text input
+    - The full system allows for seamless speech interaction
+    """
+# Create the demo
+with gr.Blocks() as demo:
+    gr.Markdown("# 🦙🎧 LLaMA-Omni Deployment Setup")
+    with gr.Tab("Status"):
+        status = gr.Markdown(launch_services())
+    with gr.Tab("Setup"):
+        check_btn = gr.Button("Check Dependencies")
+        result = gr.Textbox(label="Setup Status")
+        check_btn.click(check_dependencies, outputs=result)
+    with gr.Tab("About"):
+        gr.Markdown("""
+        # About LLaMA-Omni
+        LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
+        ## Features
+        * Built on Llama-3.1-8B-Instruct, ensuring high-quality responses
+        * Low-latency speech interaction with a latency as low as 226ms
+        * Simultaneous generation of both text and speech responses
+        ## License
+        This code is released under the Apache-2.0 License. The model is intended for academic research purposes only and may NOT be used for commercial purposes.
+        Original work by Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng.
+        """)
+# Launch the app
+if __name__ == "__main__":
+    demo.launch()

launch_llama_omni2.py DELETED Viewed

@@ -1,486 +0,0 @@
-#!/usr/bin/env python3
-"""
-LLaMA-Omni2 Direct Launcher
----------------------------
-This script extracts and directly runs the LLaMA-Omni2 components without
-relying on package imports.
-"""
-import os
-import sys
-import subprocess
-import time
-import argparse
-import shutil
-import importlib.util
-import tempfile
-import logging
-# Configure logging
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-logger = logging.getLogger(__name__)
-# Define paths
-EXTRACTION_DIR = "/home/user/app/llama_omni2_extracted"
-MODELS_DIR = "/home/user/app/models"
-LLAMA_OMNI2_MODEL_NAME = "LLaMA-Omni2-0.5B"
-LLAMA_OMNI2_MODEL_PATH = f"{MODELS_DIR}/{LLAMA_OMNI2_MODEL_NAME}"
-COSYVOICE_PATH = f"{MODELS_DIR}/cosy2_decoder"
-# Importe o model_downloader se disponível
-try:
-    from model_downloader import download_model_if_needed, download_all_models, get_model_repo_id, NO_DOWNLOAD
-    has_model_downloader = True
-except ImportError:
-    has_model_downloader = False
-    NO_DOWNLOAD = False
-# Garantir que os modelos estão disponíveis
-def ensure_models_available():
-    """Garante que os modelos necessários estão disponíveis"""
-    if has_model_downloader:
-        if NO_DOWNLOAD:
-            logger.info("Modo NO_DOWNLOAD ativado. Os modelos não serão baixados, usando diretamente do Hugging Face Hub.")
-            return
-        logger.info("Verificando modelos necessários para o LLaMA-Omni2...")
-        download_model_if_needed("llama_omni2")
-        download_model_if_needed("cosy2_decoder")
-        download_model_if_needed("speech_encoder")
-        logger.info("Verificação de modelos concluída")
-    else:
-        logger.warning("model_downloader não está disponível. Os modelos devem estar disponíveis em: " + MODELS_DIR)
-# Additional imports
-def download_dependencies():
-    """Download and install required Python packages for LLaMA-Omni2"""
-    print("Installing required dependencies...")
-    dependencies = [
-        "gradio>=3.50.2",
-        "fastapi",
-        "uvicorn",
-        "pydantic",
-        "transformers>=4.36.2",
-        "sentencepiece",
-        "huggingface_hub"
-    ]
-    try:
-        subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade"] + dependencies, check=True)
-        print("Dependencies installed successfully")
-        return True
-    except subprocess.CalledProcessError as e:
-        print(f"Error installing dependencies: {e}")
-        return False
-def ensure_module_structure(extraction_dir):
-    """Ensure that the extracted module has the necessary structure"""
-    print("Ensuring proper module structure...")
-    # Create __init__.py files if they don't exist
-    module_dirs = [
-        os.path.join(extraction_dir, "llama_omni2"),
-        os.path.join(extraction_dir, "llama_omni2", "serve"),
-        os.path.join(extraction_dir, "llama_omni2", "model"),
-        os.path.join(extraction_dir, "llama_omni2", "common")
-    ]
-    for dir_path in module_dirs:
-        os.makedirs(dir_path, exist_ok=True)
-        init_file = os.path.join(dir_path, "__init__.py")
-        if not os.path.exists(init_file):
-            with open(init_file, 'w') as f:
-                f.write("# Auto-generated __init__.py file\n")
-            print(f"Created {init_file}")
-    # Create missing module files with required constants and functions
-    dummy_modules = {
-        # Utils module
-        os.path.join(extraction_dir, "llama_omni2", "utils.py"): """
-# Dummy utils module
-def dummy_function():
-    pass
-""",
-        # Constants module - required by controller.py and model_worker.py
-        os.path.join(extraction_dir, "llama_omni2", "constants.py"): """
-# Constants required by LLaMA-Omni2 modules
-# Controller constants
-CONTROLLER_HEART_BEAT_EXPIRATION = 120
-CONTROLLER_STATUS_POLLING_INTERVAL = 15
-# Worker constants
-WORKER_HEART_BEAT_INTERVAL = 30
-WORKER_API_TIMEOUT = 100
-# Other constants that might be needed
-DEFAULT_PORT = 8000
-"""
-    }
-    for file_path, content in dummy_modules.items():
-        if not os.path.exists(file_path):
-            with open(file_path, 'w') as f:
-                f.write(content)
-            print(f"Created {file_path}")
-    return True
-def start_controller():
-    """Start the LLaMA-Omni2 controller directly"""
-    print("=== Starting LLaMA-Omni2 Controller ===")
-    # First try to use our custom implementation
-    direct_controller_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "controller.py")
-    if os.path.exists(direct_controller_path):
-        print(f"Using custom controller implementation: {direct_controller_path}")
-        cmd = [
-            sys.executable, direct_controller_path,
-            "--host", "0.0.0.0",
-            "--port", "10000"
-        ]
-        env = os.environ.copy()
-        process = subprocess.Popen(cmd, env=env)
-        print(f"Controller started with PID: {process.pid}")
-        return process
-    # Fall back to a simple controller implementation
-    print("No controller script found. Implementing a simple controller...")
-    try:
-        from fastapi import FastAPI, HTTPException
-        import uvicorn
-        from pydantic import BaseModel
-        import threading
-        app = FastAPI()
-        class ModelInfo(BaseModel):
-            model_name: str
-            worker_name: str
-            worker_addr: str
-        # Simple in-memory storage
-        registered_models = {}
-        @app.get("/")
-        def read_root():
-            return {"status": "ok", "models": list(registered_models.keys())}
-        @app.get("/api/v1/models")
-        def list_models():
-            return {"models": list(registered_models.keys())}
-        @app.post("/api/v1/register_worker")
-        def register_worker(model_info: ModelInfo):
-            registered_models[model_info.model_name] = {
-                "worker_name": model_info.worker_name,
-                "worker_addr": model_info.worker_addr
-            }
-            return {"status": "ok"}
-        # Start a simple controller
-        def run_controller():
-            uvicorn.run(app, host="0.0.0.0", port=10000)
-        thread = threading.Thread(target=run_controller, daemon=True)
-        thread.start()
-        print("Simple controller started on port 10000")
-        # Return a dummy process for compatibility
-        class DummyProcess:
-            def __init__(self):
-                self.pid = 0
-            def terminate(self):
-                pass
-            def poll(self):
-                return None
-            def wait(self, timeout=None):
-                pass
-        return DummyProcess()
-    except ImportError as e:
-        print(f"Failed to create simple controller: {e}")
-        return None
-def start_model_worker():
-    """Start the LLaMA-Omni2 model worker directly"""
-    print("=== Starting LLaMA-Omni2 Model Worker ===")
-    # First try to use our custom implementation
-    direct_worker_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "model_worker.py")
-    if os.path.exists(direct_worker_path):
-        print(f"Using custom model worker implementation: {direct_worker_path}")
-        cmd = [
-            sys.executable, direct_worker_path,
-            "--host", "0.0.0.0",
-            "--controller", "http://localhost:10000",
-            "--port", "40000",
-            "--worker", "http://localhost:40000",
-            "--model-path", LLAMA_OMNI2_MODEL_PATH,
-            "--model-name", LLAMA_OMNI2_MODEL_NAME
-        ]
-        env = os.environ.copy()
-        process = subprocess.Popen(cmd, env=env)
-        print(f"Model worker started with PID: {process.pid}")
-        return process
-    # Fall back to a simple implementation
-    print("No model worker script found. Will try to start Gradio directly with the model.")
-    class DummyProcess:
-        def __init__(self):
-            self.pid = 0
-        def terminate(self):
-            pass
-        def poll(self):
-            return None
-        def wait(self, timeout=None):
-            pass
-    return DummyProcess()
-def start_gradio_server():
-    """Start the LLaMA-Omni2 Gradio web server directly"""
-    print("=== Starting LLaMA-Omni2 Gradio Server ===")
-    # First try to use our custom implementation
-    direct_gradio_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "gradio_web_server.py")
-    if os.path.exists(direct_gradio_path):
-        print(f"Using custom Gradio server implementation: {direct_gradio_path}")
-        cmd = [
-            sys.executable, direct_gradio_path,
-            "--host", "0.0.0.0",
-            "--port", "7860",
-            "--controller-url", "http://localhost:10000",
-            "--vocoder-dir", COSYVOICE_PATH
-        ]
-        env = os.environ.copy()
-        process = subprocess.Popen(cmd, env=env)
-        print(f"Gradio server started with PID: {process.pid}")
-        return process
-    # Fall back to a simple Gradio implementation
-    print("No Gradio server found. Attempting to create a simple interface...")
-    try:
-        import gradio as gr
-        import threading
-        from transformers import AutoModelForCausalLM, AutoTokenizer
-        import torch
-        # Simple function to launch a basic Gradio interface
-        def launch_simple_gradio():
-            try:
-                print(f"Loading model from {LLAMA_OMNI2_MODEL_PATH}...")
-                # Check for CUDA availability
-                device = "cuda" if torch.cuda.is_available() else "cpu"
-                print(f"Using device: {device}")
-                if device == "cuda":
-                    print(f"CUDA Device: {torch.cuda.get_device_name(0)}")
-                    print(f"CUDA Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
-                tokenizer = AutoTokenizer.from_pretrained(LLAMA_OMNI2_MODEL_PATH)
-                model = AutoModelForCausalLM.from_pretrained(LLAMA_OMNI2_MODEL_PATH).to(device)
-                def generate_text(input_text):
-                    inputs = tokenizer(input_text, return_tensors="pt").to(device)
-                    outputs = model.generate(inputs.input_ids, max_length=100)
-                    return tokenizer.decode(outputs[0], skip_special_tokens=True)
-                with gr.Blocks() as demo:
-                    gr.Markdown("# LLaMA-Omni2 Simple Interface")
-                    with gr.Tab("Text Generation"):
-                        input_text = gr.Textbox(label="Input Text")
-                        output_text = gr.Textbox(label="Generated Text")
-                        generate_btn = gr.Button("Generate")
-                        generate_btn.click(generate_text, inputs=input_text, outputs=output_text)
-                demo.launch(server_name="0.0.0.0", server_port=7860)
-            except Exception as e:
-                print(f"Error in simple Gradio interface: {e}")
-        thread = threading.Thread(target=launch_simple_gradio, daemon=True)
-        thread.start()
-        print("Simple Gradio interface started on port 7860")
-        class DummyProcess:
-            def __init__(self):
-                self.pid = 0
-            def terminate(self):
-                pass
-            def poll(self):
-                return None
-            def wait(self, timeout=None):
-                pass
-        return DummyProcess()
-    except ImportError as e:
-        print(f"Failed to create simple Gradio interface: {e}")
-        return None
-def patch_extracted_files(extraction_dir):
-    """Patch the extracted Python files to handle missing imports"""
-    print("Patching extracted Python files to handle missing imports...")
-    # Define files to patch and their imports to check/fix
-    files_to_patch = {
-        os.path.join(extraction_dir, "llama_omni2", "serve", "controller.py"): [
-            "from llama_omni2.constants import",
-            "from llama_omni2.model import",
-            "from llama_omni2.common import",
-        ],
-        os.path.join(extraction_dir, "llama_omni2", "serve", "model_worker.py"): [
-            "from llama_omni2.constants import",
-            "from llama_omni2.model import",
-            "from llama_omni2.common import",
-        ],
-        os.path.join(extraction_dir, "llama_omni2", "serve", "gradio_web_server.py"): [
-            "from llama_omni2.constants import",
-            "from llama_omni2.model import",
-            "from llama_omni2.common import",
-        ]
-    }
-    patched_files = []
-    for file_path, imports_to_check in files_to_patch.items():
-        if not os.path.exists(file_path):
-            print(f"Warning: File {file_path} not found, skipping patch")
-            continue
-        with open(file_path, 'r') as f:
-            content = f.read()
-        original_content = content
-        modified = False
-        # Add try-except blocks around problematic imports
-        for import_line in imports_to_check:
-            if import_line in content:
-                # Find the full line containing this import
-                import_lines = [line for line in content.split('\n') if import_line in line]
-                for full_line in import_lines:
-                    # Extract the variable names being imported
-                    try:
-                        imported_vars = full_line.split('import')[1].strip().split(',')
-                        imported_vars = [var.strip() for var in imported_vars]
-                        # Create a try-except block with fallback definitions
-                        replacement = f"""try:
-    {full_line}
-except ImportError:
-    # Auto-generated fallback for missing import
-    print("Warning: Creating fallback for missing import: {full_line}")
-"""
-                        for var in imported_vars:
-                            if var:  # Skip empty strings
-                                replacement += f"    {var} = object()  # Dummy placeholder\n"
-                        # Replace the original import with the try-except block
-                        content = content.replace(full_line, replacement)
-                        modified = True
-                    except Exception as e:
-                        print(f"Error processing import line '{full_line}': {e}")
-        # Write the modified content back if changes were made
-        if modified:
-            with open(file_path, 'w') as f:
-                f.write(content)
-            patched_files.append(file_path)
-            print(f"Patched file: {file_path}")
-    if patched_files:
-        print(f"Successfully patched {len(patched_files)} files")
-    else:
-        print("No files needed patching")
-    return patched_files
-def main():
-    """Main entry point for the launcher script"""
-    parser = argparse.ArgumentParser(description="LLaMA-Omni2 Direct Launcher")
-    parser.add_argument("--skip-download", action="store_true", help="Skip downloading dependencies")
-    parser.add_argument("--no-model-download", action="store_true", help="Don't download models, use them directly from HF Hub")
-    parser.add_argument("--extraction-dir", type=str, default=EXTRACTION_DIR, help="Directory to extract LLaMA-Omni2 to")
-    parser.add_argument("--models-dir", type=str, default=MODELS_DIR, help="Directory containing models")
-    parser.add_argument("--skip-modules", action="store_true", help="Skip module structure creation")
-    parser.add_argument("--controller-only", action="store_true", help="Start only the controller")
-    parser.add_argument("--worker-only", action="store_true", help="Start only the model worker")
-    parser.add_argument("--gradio-only", action="store_true", help="Start only the Gradio interface")
-    args = parser.parse_args()
-    # Update paths based on arguments
-    global EXTRACTION_DIR, MODELS_DIR, LLAMA_OMNI2_MODEL_PATH, COSYVOICE_PATH
-    EXTRACTION_DIR = args.extraction_dir
-    MODELS_DIR = args.models_dir
-    LLAMA_OMNI2_MODEL_PATH = f"{MODELS_DIR}/{LLAMA_OMNI2_MODEL_NAME}"
-    COSYVOICE_PATH = f"{MODELS_DIR}/cosy2_decoder"
-    # Set NO_DOWNLOAD environment variable if --no-model-download is specified
-    if args.no_model_download:
-        os.environ["NO_DOWNLOAD"] = "1"
-        global NO_DOWNLOAD
-        NO_DOWNLOAD = True
-        logger.info("Modo NO_DOWNLOAD ativado via linha de comando")
-    print("=== LLaMA-Omni2 Direct Launcher ===")
-    print(f"Extraction directory: {EXTRACTION_DIR}")
-    print(f"Models directory: {MODELS_DIR}")
-    print(f"Downloading models: {'No' if NO_DOWNLOAD else 'Yes'}")
-    # Ensure models are available
-    ensure_models_available()
-    # Download dependencies if needed
-    if not args.skip_download:
-        download_dependencies()
-    # Create module structure if needed
-    if not args.skip_modules:
-        ensure_module_structure(EXTRACTION_DIR)
-    # Start the controller if needed
-    controller_process = None
-    if not args.worker_only and not args.gradio_only:
-        controller_process = start_controller()
-        # Give the controller time to start up
-        time.sleep(5)
-    # Start the model worker if needed
-    worker_process = None
-    if not args.controller_only and not args.gradio_only:
-        worker_process = start_model_worker()
-        # Give the worker time to start up
-        time.sleep(5)
-    # Start the Gradio interface if needed
-    gradio_process = None
-    if not args.controller_only and not args.worker_only:
-        gradio_process = start_gradio_server()
-    # Keep the main process running to maintain subprocesses
-    try:
-        print("Press Ctrl+C to exit...")
-        while True:
-            time.sleep(1)
-    except KeyboardInterrupt:
-        print("Shutting down...")
-        if controller_process:
-            controller_process.terminate()
-        if worker_process:
-            worker_process.terminate()
-        if gradio_process:
-            gradio_process.terminate()
-        print("Shutdown complete")
-if __name__ == "__main__":
-    sys.exit(main())

model_downloader.py DELETED Viewed

@@ -1,219 +0,0 @@
-#!/usr/bin/env python3
-"""
-Model Downloader para LLaMA-Omni2
----------------------------------
-Este script gerencia o download automático dos modelos necessários para o LLaMA-Omni2.
-Os modelos serão baixados apenas quando necessário durante a inicialização.
-"""
-import os
-import sys
-import logging
-import huggingface_hub
-from huggingface_hub import snapshot_download, hf_hub_download
-from pathlib import Path
-import torch
-import shutil
-# Configurar logging
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-logger = logging.getLogger(__name__)
-# Configurações de modelos
-MODELS_DIR = os.environ.get("MODELS_DIR", "models")
-HF_TOKEN = os.environ.get("HF_TOKEN", None)
-# Modo sem download (NO_DOWNLOAD=1)
-NO_DOWNLOAD = os.environ.get("NO_DOWNLOAD", "0").lower() in ("1", "true", "yes")
-# Mensagem de debug para verificar o status da variável
-logger.info(f"Inicializando model_downloader.py com NO_DOWNLOAD={NO_DOWNLOAD} (valor da env: {os.environ.get('NO_DOWNLOAD', 'não definido')})")
-# Modelos necessários
-MODEL_CONFIGS = {
-    "speech_encoder": {
-        "repo_id": "openai/whisper-large-v3",
-        "local_dir": os.path.join(MODELS_DIR, "speech_encoder", "whisper-large-v3"),
-        "files": None,  # None significa baixar o modelo completo
-    },
-    "cosy2_decoder": {
-        "repo_id": "ICTNLP/cosy2_decoder",
-        "local_dir": os.path.join(MODELS_DIR, "cosy2_decoder"),
-        "files": [
-            "flow.decoder.estimator.fp32.onnx",
-            "flow.decoder.estimator.fp16.A10.plan",
-            "flow.encoder.fp32.zip",
-            "flow.decoder.estimator.fp16.Volta.plan",
-            "hift.pt",
-            "campplus.onnx",
-            "cosyvoice.yaml",
-        ],
-    },
-    "llama_omni2": {
-        "repo_id": "ICTNLP/LLaMA-Omni2-0.5B",
-        "local_dir": os.path.join(MODELS_DIR, "LLaMA-Omni2-0.5B"),
-        "files": None,  # None significa baixar o modelo completo
-    }
-}
-def ensure_model_dir():
-    """Garante que o diretório models existe"""
-    if NO_DOWNLOAD:
-        logger.info("Modo NO_DOWNLOAD ativado. Pulando criação de diretórios.")
-        return
-    os.makedirs(MODELS_DIR, exist_ok=True)
-    for model_config in MODEL_CONFIGS.values():
-        os.makedirs(model_config["local_dir"], exist_ok=True)
-def is_model_downloaded(model_key):
-    """Verifica se um modelo já foi baixado"""
-    # No modo sem download, sempre retorna False para pular a verificação
-    if NO_DOWNLOAD:
-        logger.info(f"Modo NO_DOWNLOAD ativado. Pulando verificação para {model_key}.")
-        return False
-    config = MODEL_CONFIGS[model_key]
-    local_dir = config["local_dir"]
-    # Se não temos uma lista específica de arquivos, verificar apenas se o diretório existe
-    if config["files"] is None:
-        # Verificar se o diretório existe e tem arquivos
-        if os.path.exists(local_dir) and any(os.listdir(local_dir)):
-            logger.info(f"Modelo {model_key} já parece estar baixado em {local_dir}")
-            return True
-        return False
-    # Verificar se todos os arquivos específicos existem
-    for file in config["files"]:
-        file_path = os.path.join(local_dir, file)
-        if not os.path.exists(file_path):
-            logger.info(f"Arquivo {file} não encontrado para o modelo {model_key}")
-            return False
-    logger.info(f"Todos os arquivos para o modelo {model_key} já estão disponíveis em {local_dir}")
-    return True
-def download_model(model_key):
-    """Baixa um modelo específico do Hugging Face Hub"""
-    # Verificar o modo sem download
-    if NO_DOWNLOAD:
-        logger.warning(f"Modo NO_DOWNLOAD ativado. Pulando download de {model_key}")
-        return False
-    config = MODEL_CONFIGS[model_key]
-    repo_id = config["repo_id"]
-    local_dir = config["local_dir"]
-    files = config["files"]
-    try:
-        logger.info(f"Baixando modelo {model_key} do repo {repo_id}...")
-        # Se temos uma lista específica de arquivos, baixar um por um
-        if files is not None:
-            for file in files:
-                file_path = os.path.join(local_dir, file)
-                # Pular se o arquivo já existe
-                if os.path.exists(file_path):
-                    logger.info(f"Arquivo {file} já existe, pulando download")
-                    continue
-                logger.info(f"Baixando arquivo {file} para {file_path}")
-                try:
-                    hf_hub_download(
-                        repo_id=repo_id,
-                        filename=file,
-                        local_dir=local_dir,
-                        local_dir_use_symlinks=False,
-                        token=HF_TOKEN
-                    )
-                except Exception as e:
-                    logger.warning(f"Erro ao baixar arquivo {file}: {e}. Tentando continuar.")
-        else:
-            # Baixar o modelo completo
-            snapshot_download(
-                repo_id=repo_id,
-                local_dir=local_dir,
-                local_dir_use_symlinks=False,
-                token=HF_TOKEN
-            )
-        logger.info(f"Modelo {model_key} baixado com sucesso para {local_dir}")
-        return True
-    except Exception as e:
-        logger.error(f"Erro ao baixar modelo {model_key}: {e}")
-        return False
-def cleanup_model_dir(model_key):
-    """Remove arquivos incompletos ou corruptos de um diretório de modelo"""
-    # Verificar o modo sem download
-    if NO_DOWNLOAD:
-        logger.info(f"Modo NO_DOWNLOAD ativado. Pulando limpeza de diretório para {model_key}.")
-        return True
-    config = MODEL_CONFIGS[model_key]
-    local_dir = config["local_dir"]
-    try:
-        # Procurar por arquivos .incomplete e removê-los
-        for root, dirs, files in os.walk(local_dir):
-            for file in files:
-                if file.endswith(".incomplete"):
-                    file_path = os.path.join(root, file)
-                    logger.info(f"Removendo arquivo incompleto: {file_path}")
-                    os.remove(file_path)
-        return True
-    except Exception as e:
-        logger.error(f"Erro ao limpar diretório do modelo {model_key}: {e}")
-        return False
-def download_all_models():
-    """Baixa todos os modelos configurados, se necessário"""
-    # Verificar o modo sem download
-    if NO_DOWNLOAD:
-        logger.warning("Modo NO_DOWNLOAD ativado. Nenhum modelo será baixado.")
-        return
-    ensure_model_dir()
-    for model_key in MODEL_CONFIGS:
-        if not is_model_downloaded(model_key):
-            logger.info(f"Iniciando download do modelo {model_key}")
-            cleanup_model_dir(model_key)
-            download_model(model_key)
-        else:
-            logger.info(f"Modelo {model_key} já está disponível localmente")
-def download_model_if_needed(model_key):
-    """Baixa um modelo específico se ele não estiver disponível"""
-    # Verificar o modo sem download
-    if NO_DOWNLOAD:
-        logger.info(f"Modo NO_DOWNLOAD ativado. Usando repo_id diretamente para {model_key}")
-        return False
-    ensure_model_dir()
-    if model_key not in MODEL_CONFIGS:
-        logger.error(f"Modelo {model_key} não está configurado para download")
-        return False
-    if not is_model_downloaded(model_key):
-        logger.info(f"Modelo {model_key} não encontrado localmente. Iniciando download...")
-        cleanup_model_dir(model_key)
-        return download_model(model_key)
-    else:
-        logger.info(f"Modelo {model_key} já está disponível localmente")
-        return True
-def get_model_repo_id(model_key):
-    """Retorna o repo_id do modelo para uso direto sem download"""
-    if model_key not in MODEL_CONFIGS:
-        logger.error(f"Modelo {model_key} não está configurado")
-        return None
-    return MODEL_CONFIGS[model_key]["repo_id"]
-if __name__ == "__main__":
-    # Se executado diretamente, baixar todos os modelos
-    download_all_models()

no_download.py DELETED Viewed

@@ -1,55 +0,0 @@
-#!/usr/bin/env python3
-"""
-Script para iniciar aplicações no modo sem download.
-Este script define explicitamente a variável NO_DOWNLOAD=1 no ambiente Python,
-garantindo que nenhum modelo seja baixado.
-"""
-import os
-import sys
-import subprocess
-import logging
-# Configurar logging
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-logger = logging.getLogger("no_download")
-# Definir a variável NO_DOWNLOAD no ambiente
-os.environ["NO_DOWNLOAD"] = "1"
-logger.info(f"Variável NO_DOWNLOAD definida como: {os.environ.get('NO_DOWNLOAD')}")
-# Verificar argumentos de linha de comando
-if len(sys.argv) < 2:
-    logger.info("Nenhum script especificado. Executando app.py por padrão.")
-    target_script = "app.py"
-else:
-    target_script = sys.argv[1]
-    logger.info(f"Executando script: {target_script}")
-# Lista de argumentos extras
-args = sys.argv[2:]
-# Exibir informações
-print("=" * 70)
-print(f"Executando {target_script} no modo SEM DOWNLOAD (NO_DOWNLOAD=1)")
-print("Os modelos serão usados diretamente do Hugging Face Hub, sem baixar localmente")
-print("=" * 70)
-# Executar o script alvo com os mesmos argumentos
-try:
-    # Criar um dicionário de ambiente com NO_DOWNLOAD definido
-    env = os.environ.copy()
-    env["NO_DOWNLOAD"] = "1"
-    # Construir o comando
-    command = [sys.executable, target_script] + args
-    logger.info(f"Executando comando: {' '.join(command)}")
-    # Execute o comando com o ambiente modificado
-    process = subprocess.Popen(command, env=env)
-    process.wait()
-    sys.exit(process.returncode)
-except Exception as e:
-    logger.error(f"Erro ao executar {target_script}: {e}")
-    sys.exit(1)

omni_speech/__init__.py ADDED Viewed

File without changes

omni_speech/infer/__init__.py ADDED Viewed

File without changes

omni_speech/infer/examples/example.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+    "instructions": [
+        {
+            "id": "001",
+            "input_type": "speech",
+            "audio_path": "input_audio.wav",
+            "transcription": "What is the weather like today?"
+        },
+        {
+            "id": "002",
+            "input_type": "text",
+            "text": "Tell me about the history of artificial intelligence."
+        }
+    ]
+}

omni_speech/infer/inference.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import argparse
+import json
+import os
+import logging
+from typing import Dict, List, Optional
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+def load_model():
+    """Load LLaMA-Omni model for inference (placeholder)."""
+    logger.info("Loading LLaMA-Omni model...")
+    logger.info("Note: In a real deployment, the model would be downloaded from Hugging Face")
+    return "PLACEHOLDER_MODEL"
+def load_vocoder():
+    """Load vocoder for speech synthesis (placeholder)."""
+    logger.info("Loading vocoder...")
+    logger.info("Note: In a real deployment, the vocoder would be downloaded")
+    return "PLACEHOLDER_VOCODER"
+def transcribe_audio(audio_path):
+    """Transcribe audio using Whisper (placeholder)."""
+    logger.info(f"Transcribing audio: {audio_path}")
+    # In a real implementation, this would use the Whisper model
+    return f"Placeholder transcription for {os.path.basename(audio_path)}"
+def process_instruction(instruction, model, vocoder):
+    """Process a single instruction."""
+    instruction_id = instruction.get("id", "unknown")
+    input_type = instruction.get("input_type")
+    logger.info(f"Processing instruction {instruction_id}, type: {input_type}")
+    if input_type == "speech":
+        audio_path = instruction.get("audio_path")
+        if not audio_path:
+            logger.error(f"Instruction {instruction_id}: Missing audio path")
+            return None
+        # Check if transcription is provided, otherwise transcribe
+        transcription = instruction.get("transcription")
+        if not transcription:
+            transcription = transcribe_audio(audio_path)
+        # In a real implementation, this would process the transcription through the model
+        text_response = f"Placeholder response to: {transcription}"
+        # In a real implementation, this would generate speech from the text response
+        speech_output = "PLACEHOLDER_SPEECH_OUTPUT"
+        return {
+            "id": instruction_id,
+            "input_type": input_type,
+            "transcription": transcription,
+            "text_response": text_response,
+            "speech_output": speech_output
+        }
+    elif input_type == "text":
+        text = instruction.get("text")
+        if not text:
+            logger.error(f"Instruction {instruction_id}: Missing text")
+            return None
+        # In a real implementation, this would process the text through the model
+        text_response = f"Placeholder response to: {text}"
+        # In a real implementation, this would generate speech from the text response
+        speech_output = "PLACEHOLDER_SPEECH_OUTPUT"
+        return {
+            "id": instruction_id,
+            "input_type": input_type,
+            "text": text,
+            "text_response": text_response,
+            "speech_output": speech_output
+        }
+    else:
+        logger.error(f"Instruction {instruction_id}: Unknown input type: {input_type}")
+        return None
+def process_instructions(input_file, output_dir):
+    """Process instructions from input file and save results to output directory."""
+    # Create output directory if it doesn't exist
+    os.makedirs(output_dir, exist_ok=True)
+    # Load input JSON
+    with open(input_file, 'r') as f:
+        data = json.load(f)
+    instructions = data.get("instructions", [])
+    logger.info(f"Loaded {len(instructions)} instructions from {input_file}")
+    # Load model and vocoder
+    model = load_model()
+    vocoder = load_vocoder()
+    # Process each instruction
+    results = []
+    for instruction in instructions:
+        result = process_instruction(instruction, model, vocoder)
+        if result:
+            results.append(result)
+    # Save results
+    output_file = os.path.join(output_dir, f"{os.path.basename(input_file)}_results.json")
+    with open(output_file, 'w') as f:
+        json.dump({"results": results}, f, indent=2)
+    logger.info(f"Saved {len(results)} results to {output_file}")
+def main():
+    """Run inference."""
+    parser = argparse.ArgumentParser(description="LLaMA-Omni inference")
+    parser.add_argument("--input", type=str, required=True, help="Input JSON file with instructions")
+    parser.add_argument("--output", type=str, required=True, help="Output directory for results")
+    args = parser.parse_args()
+    process_instructions(args.input, args.output)
+if __name__ == "__main__":
+    main()

omni_speech/infer/run.sh ADDED Viewed

	@@ -0,0 +1,32 @@

+#!/bin/bash
+# Run inference on LLaMA-Omni model
+# Usage: bash run.sh <examples_directory>
+EXAMPLES_DIR=$1
+if [ -z "$EXAMPLES_DIR" ]; then
+    echo "Error: Examples directory not specified"
+    echo "Usage: bash run.sh <examples_directory>"
+    exit 1
+fi
+if [ ! -d "$EXAMPLES_DIR" ]; then
+    echo "Error: Directory $EXAMPLES_DIR does not exist"
+    exit 1
+fi
+# Check if the model and vocoder exist (placeholders for real implementation)
+echo "Checking if required models are available..."
+echo "Note: In a real deployment, the model would be downloaded from Hugging Face"
+# Process each JSON file in the examples directory
+for json_file in "$EXAMPLES_DIR"/*.json; do
+    if [ -f "$json_file" ]; then
+        echo "Processing $json_file..."
+        # In a real implementation, this would call a Python script
+        echo "python -m omni_speech.infer.inference --input $json_file --output results/$(basename $json_file .json)"
+    fi
+done
+echo "Inference complete."

omni_speech/serve/__init__.py ADDED Viewed

File without changes

omni_speech/serve/controller.py ADDED Viewed

	@@ -0,0 +1,104 @@

+import argparse
+import asyncio
+import json
+import time
+from fastapi import FastAPI, WebSocket, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+import uvicorn
+from typing import Dict, List, Optional, Union
+app = FastAPI()
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Store worker information
+worker_info = {}
+@app.get("/status")
+async def get_status():
+    """Get the status of the controller."""
+    return {"status": "ok", "worker_count": len(worker_info)}
+@app.get("/worker_info")
+async def get_worker_info():
+    """Get information about all registered workers."""
+    return {"worker_info": worker_info}
+@app.post("/register_worker")
+async def register_worker(worker_info_data: Dict):
+    """Register a new worker."""
+    worker_name = worker_info_data.get("name")
+    worker_url = worker_info_data.get("url")
+    if not worker_name or not worker_url:
+        raise HTTPException(status_code=400, detail="Missing name or URL for worker")
+    models = worker_info_data.get("models", [])
+    worker_info[worker_name] = {
+        "url": worker_url,
+        "models": models,
+        "status": "alive",
+        "last_heartbeat": time.time()
+    }
+    return {"status": "registered", "worker_name": worker_name}
+@app.post("/unregister_worker")
+async def unregister_worker(worker_name: str):
+    """Unregister a worker."""
+    if worker_name in worker_info:
+        del worker_info[worker_name]
+        return {"status": "unregistered", "worker_name": worker_name}
+    else:
+        raise HTTPException(status_code=404, detail=f"Worker {worker_name} not found")
+@app.post("/heartbeat")
+async def heartbeat(worker_data: Dict):
+    """Process worker heartbeat."""
+    worker_name = worker_data.get("name")
+    if worker_name in worker_info:
+        worker_info[worker_name]["last_heartbeat"] = time.time()
+        worker_info[worker_name]["status"] = "alive"
+        return {"status": "received"}
+    else:
+        raise HTTPException(status_code=404, detail=f"Worker {worker_name} not found")
+@app.get("/get_worker_address")
+async def get_worker_address(model_name: str):
+    """Get the address of a worker that hosts the requested model."""
+    for name, info in worker_info.items():
+        if model_name in info["models"] and info["status"] == "alive":
+            return {"worker_address": info["url"]}
+    raise HTTPException(status_code=404, detail=f"No available worker found for model {model_name}")
+@app.get("/list_models")
+async def list_models():
+    """List all available models across workers."""
+    available_models = []
+    for name, info in worker_info.items():
+        if info["status"] == "alive":
+            available_models.extend(info["models"])
+    return {"models": list(set(available_models))}
+def main():
+    """Run the controller server."""
+    parser = argparse.ArgumentParser(description="LLaMA-Omni controller for managing worker nodes")
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Host to bind the server")
+    parser.add_argument("--port", type=int, default=10000, help="Port to bind the server")
+    args = parser.parse_args()
+    uvicorn.run(app, host=args.host, port=args.port, log_level="info")
+if __name__ == "__main__":
+    main()

omni_speech/serve/gradio_web_server.py ADDED Viewed

	@@ -0,0 +1,234 @@

+import argparse
+import json
+import os
+import time
+import requests
+import gradio as gr
+import uuid
+import logging
+from typing import Dict, List, Optional, Tuple, Union
+import tempfile
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+# Global variables
+controller_url = None
+vocoder_path = None
+vocoder_cfg = None
+model_list_mode = "once"  # "once" or "reload"
+avatars = {}
+message_history = {}
+def list_models():
+    """Get list of available models from the controller."""
+    try:
+        response = requests.get(f"{controller_url}/list_models")
+        if response.status_code == 200:
+            models = response.json().get("models", [])
+            return models
+        else:
+            logger.error(f"Failed to list models: {response.text}")
+            return []
+    except Exception as e:
+        logger.error(f"Error listing models: {str(e)}")
+        return []
+def get_worker_address(model_name):
+    """Get address of a worker that serves the requested model."""
+    try:
+        response = requests.get(f"{controller_url}/get_worker_address", params={"model_name": model_name})
+        if response.status_code == 200:
+            return response.json().get("worker_address")
+        else:
+            logger.error(f"Failed to get worker address: {response.text}")
+            return None
+    except Exception as e:
+        logger.error(f"Error getting worker address: {str(e)}")
+        return None
+def transcribe_audio(audio_path):
+    """Placeholder for audio transcription."""
+    # In a real implementation, this would use the Whisper model
+    logger.info(f"Transcribing audio from {audio_path}...")
+    # Simulated transcription
+    return f"This is a placeholder transcription for audio file {os.path.basename(audio_path)}"
+def process_speech_to_speech(audio_path, model_name):
+    """Process speech to speech generation."""
+    if not audio_path:
+        return "Error: No audio provided", None
+    try:
+        # Transcribe the audio
+        transcription = transcribe_audio(audio_path)
+        # Get worker address
+        worker_address = get_worker_address(model_name)
+        if not worker_address:
+            return f"Error: No worker available for model {model_name}", None
+        # Send request to worker
+        response = requests.post(
+            f"{worker_address}/generate_speech",
+            json={"prompt": transcription}
+        )
+        if response.status_code == 200:
+            result = response.json()
+            text_response = result.get("text", "No text response generated")
+            speech_url = result.get("speech_url")
+            # In a real implementation, we would handle the audio file
+            # For now, we'll just return the text response
+            return text_response, speech_url
+        else:
+            return f"Error: {response.text}", None
+    except Exception as e:
+        logger.error(f"Error in speech-to-speech processing: {str(e)}")
+        return f"Error: {str(e)}", None
+def process_text_to_speech(text, model_name):
+    """Process text to speech generation."""
+    if not text:
+        return "Error: No text provided", None
+    try:
+        # Get worker address
+        worker_address = get_worker_address(model_name)
+        if not worker_address:
+            return f"Error: No worker available for model {model_name}", None
+        # Send request to worker
+        response = requests.post(
+            f"{worker_address}/generate_speech",
+            json={"prompt": text}
+        )
+        if response.status_code == 200:
+            result = response.json()
+            text_response = result.get("text", "No text response generated")
+            speech_url = result.get("speech_url")
+            # In a real implementation, we would handle the audio file
+            # For now, we'll just return the text response
+            return text_response, speech_url
+        else:
+            return f"Error: {response.text}", None
+    except Exception as e:
+        logger.error(f"Error in text-to-speech processing: {str(e)}")
+        return f"Error: {str(e)}", None
+def create_chat_ui():
+    """Create the Gradio chat UI."""
+    available_models = list_models()
+    logger.info(f"Available models: {available_models}")
+    with gr.Blocks(css="footer {visibility: hidden}") as demo:
+        gr.Markdown("# 🦙🎧 LLaMA-Omni Speech Interaction Demo")
+        with gr.Row():
+            with gr.Column(scale=3):
+                # Input area
+                with gr.Tab("Speech Input"):
+                    audio_input = gr.Audio(sources=["microphone", "upload"], type="filepath", label="Record or upload audio")
+                    transcription_output = gr.Textbox(label="Transcription", interactive=False)
+                with gr.Tab("Text Input"):
+                    text_input = gr.Textbox(label="Text Input", placeholder="Type your message here...")
+                # Common controls
+                with gr.Row():
+                    model_selector = gr.Dropdown(choices=available_models, label="Model", value=available_models[0] if available_models else None)
+                    submit_btn = gr.Button("Submit")
+                    if model_list_mode == "reload":
+                        refresh_btn = gr.Button("Refresh Models")
+            with gr.Column(scale=4):
+                # Output area
+                chatbot = gr.Chatbot(label="Conversation", height=500)
+                with gr.Row():
+                    audio_output = gr.Audio(label="Generated Speech", interactive=False)
+        # Event handlers
+        def on_audio_input(audio):
+            if audio:
+                transcription = transcribe_audio(audio)
+                return transcription
+            return ""
+        def on_speech_submit(audio, model_name, chat_history):
+            if not audio:
+                return chat_history, None
+            transcription = transcribe_audio(audio)
+            text_response, speech_url = process_speech_to_speech(audio, model_name)
+            # Update chat history
+            new_history = chat_history.copy()
+            new_history.append((transcription, text_response))
+            # In a real implementation, we would handle the audio file
+            # For now, we'll just return None for audio output
+            return new_history, None
+        def on_text_submit(text, model_name, chat_history):
+            if not text:
+                return chat_history, None
+            text_response, speech_url = process_text_to_speech(text, model_name)
+            # Update chat history
+            new_history = chat_history.copy()
+            new_history.append((text, text_response))
+            # In a real implementation, we would handle the audio file
+            # For now, we'll just return None for audio output
+            return new_history, None
+        def on_refresh_models():
+            return gr.Dropdown.update(choices=list_models())
+        # Connect events
+        audio_input.change(on_audio_input, [audio_input], [transcription_output])
+        submit_btn.click(
+            fn=lambda audio, text, model, chat: on_speech_submit(audio, model, chat) if audio else on_text_submit(text, model, chat),
+            inputs=[audio_input, text_input, model_selector, chatbot],
+            outputs=[chatbot, audio_output]
+        )
+        if model_list_mode == "reload":
+            refresh_btn.click(on_refresh_models, [], [model_selector])
+    return demo
+def main():
+    """Run the Gradio web server."""
+    global controller_url, vocoder_path, vocoder_cfg, model_list_mode
+    parser = argparse.ArgumentParser(description="LLaMA-Omni Gradio web server")
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Host to bind the server")
+    parser.add_argument("--port", type=int, default=8000, help="Port to bind the server")
+    parser.add_argument("--controller", type=str, required=True, help="Controller URL")
+    parser.add_argument("--vocoder", type=str, required=True, help="Path to vocoder model")
+    parser.add_argument("--vocoder-cfg", type=str, required=True, help="Path to vocoder config")
+    parser.add_argument("--model-list-mode", type=str, default="once", choices=["once", "reload"], help="Model listing mode")
+    args = parser.parse_args()
+    controller_url = args.controller
+    vocoder_path = args.vocoder
+    vocoder_cfg = args.vocoder_cfg
+    model_list_mode = args.model_list_mode
+    # Create the demo
+    demo = create_chat_ui()
+    # Launch the server
+    demo.launch(server_name=args.host, server_port=args.port, share=False)
+if __name__ == "__main__":
+    main()

omni_speech/serve/model_worker.py ADDED Viewed

	@@ -0,0 +1,211 @@

+import argparse
+import json
+import os
+import time
+import uuid
+import requests
+import threading
+import transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
+import torch
+from typing import Dict, List, Optional, Union
+import traceback
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+import uvicorn
+import logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+app = FastAPI()
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global variables
+model = None
+tokenizer = None
+model_name = None
+model_path = None
+device = "cuda" if torch.cuda.is_available() else "cpu"
+controller_url = None
+worker_url = None
+worker_id = str(uuid.uuid4())[:8]
+support_s2s = False
+def load_model(model_path_arg, s2s=False):
+    """Load LLaMA-Omni model and tokenizer."""
+    global model, tokenizer, model_name, model_path, support_s2s
+    model_name = os.path.basename(model_path_arg)
+    model_path = model_path_arg
+    support_s2s = s2s
+    logger.info(f"Loading model {model_name} from {model_path}...")
+    # This is a placeholder for downloading the model
+    # In a real implementation, it would download from HuggingFace or another source
+    logger.info(f"Model would be downloaded from huggingface.co/ictnlp/Llama-3.1-8B-Omni")
+    try:
+        # Use placeholder values since we're not actually loading the model in this setup
+        tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
+        model = "PLACEHOLDER - Model would be loaded during actual deployment"
+        logger.info(f"Model {model_name} loaded successfully")
+        return True
+    except Exception as e:
+        logger.error(f"Error loading model: {str(e)}")
+        logger.error(traceback.format_exc())
+        return False
+def register_worker():
+    """Register with the controller."""
+    global worker_id, controller_url, worker_url, model_name
+    logger.info(f"Registering worker {worker_id} with controller at {controller_url}")
+    while True:
+        try:
+            response = requests.post(
+                f"{controller_url}/register_worker",
+                json={
+                    "name": worker_id,
+                    "url": worker_url,
+                    "models": [model_name] if model_name else []
+                }
+            )
+            if response.status_code == 200:
+                logger.info(f"Worker {worker_id} registered successfully")
+                break
+            else:
+                logger.error(f"Failed to register worker: {response.text}")
+        except Exception as e:
+            logger.error(f"Error registering worker: {str(e)}")
+        # Retry after a short delay
+        time.sleep(5)
+def heartbeat_sender():
+    """Send heartbeats to the controller."""
+    global worker_id, controller_url
+    while True:
+        try:
+            response = requests.post(
+                f"{controller_url}/heartbeat",
+                json={"name": worker_id}
+            )
+            if response.status_code == 200:
+                logger.debug(f"Heartbeat sent successfully")
+            else:
+                logger.warning(f"Failed to send heartbeat: {response.text}")
+        except Exception as e:
+            logger.error(f"Error sending heartbeat: {str(e)}")
+        # Send heartbeat every 15 seconds
+        time.sleep(15)
+@app.get("/status")
+async def get_status():
+    """Get the status of the worker."""
+    return {
+        "status": "ok",
+        "model": model_name,
+        "supports_speech": support_s2s
+    }
+@app.post("/generate_speech")
+async def generate_speech(request_data: Dict):
+    """Generate speech response from a prompt."""
+    prompt = request_data.get("prompt")
+    if not prompt:
+        raise HTTPException(status_code=400, detail="Prompt is required")
+    try:
+        # This is a placeholder since we're not actually generating speech
+        # In a real implementation, it would process the prompt and return speech
+        logger.info(f"Received prompt: {prompt[:50]}...")
+        # Simulated response
+        response = {
+            "text": f"This is a response to: {prompt[:20]}...",
+            "speech_url": None,  # In a real implementation, this would be the URL to the generated speech
+            "success": True
+        }
+        return response
+    except Exception as e:
+        logger.error(f"Error generating speech: {str(e)}")
+        logger.error(traceback.format_exc())
+        raise HTTPException(status_code=500, detail=f"Error generating speech: {str(e)}")
+@app.post("/generate_text")
+async def generate_text(request_data: Dict):
+    """Generate text response from a prompt."""
+    prompt = request_data.get("prompt")
+    if not prompt:
+        raise HTTPException(status_code=400, detail="Prompt is required")
+    try:
+        # This is a placeholder since we're not actually generating text
+        # In a real implementation, it would process the prompt and return text
+        logger.info(f"Received prompt: {prompt[:50]}...")
+        # Simulated response
+        response = {
+            "text": f"This is a response to: {prompt[:20]}...",
+            "success": True
+        }
+        return response
+    except Exception as e:
+        logger.error(f"Error generating text: {str(e)}")
+        logger.error(traceback.format_exc())
+        raise HTTPException(status_code=500, detail=f"Error generating text: {str(e)}")
+def main():
+    """Run the model worker."""
+    global controller_url, worker_url
+    parser = argparse.ArgumentParser(description="LLaMA-Omni model worker")
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Host to bind the server")
+    parser.add_argument("--port", type=int, default=40000, help="Port to bind the server")
+    parser.add_argument("--controller", type=str, required=True, help="Controller URL")
+    parser.add_argument("--worker", type=str, required=True, help="Worker URL")
+    parser.add_argument("--model-path", type=str, required=True, help="Path or name of the model to load")
+    parser.add_argument("--model-name", type=str, required=True, help="Name to register the model as")
+    parser.add_argument("--s2s", action="store_true", help="Enable speech-to-speech support")
+    args = parser.parse_args()
+    controller_url = args.controller
+    worker_url = args.worker
+    # Load the model
+    if not load_model(args.model_path, args.s2s):
+        logger.error("Failed to load model. Exiting.")
+        return
+    # Register with the controller
+    register_worker()
+    # Start heartbeat thread
+    heartbeat_thread = threading.Thread(target=heartbeat_sender, daemon=True)
+    heartbeat_thread.start()
+    # Start the server
+    uvicorn.run(app, host=args.host, port=args.port, log_level="info")
+if __name__ == "__main__":
+    main()

predict.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import os
+import time
+import subprocess
+import whisper
+from cog import BasePredictor, Input, Path
+import torch
+import tempfile
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make inference faster"""
+        print("Loading models...")
+        # Load whisper for audio transcription
+        print("Loading Whisper model...")
+        self.whisper_model = whisper.load_model("large-v3", download_root="models/speech_encoder/")
+        # In a real implementation, this would load the LLaMA-Omni model
+        print("Note: In a real deployment, the LLaMA-Omni model would be loaded here")
+        # Start the controller
+        print("Starting controller...")
+        self.controller_process = subprocess.Popen([
+            "python", "-m", "omni_speech.serve.controller",
+            "--host", "0.0.0.0",
+            "--port", "10000"
+        ])
+        time.sleep(5)  # Wait for controller to start
+        # Start model worker
+        print("Starting model worker...")
+        self.model_worker_process = subprocess.Popen([
+            "python", "-m", "omni_speech.serve.model_worker",
+            "--host", "0.0.0.0",
+            "--controller", "http://localhost:10000",
+            "--port", "40000",
+            "--worker", "http://localhost:40000",
+            "--model-path", "Llama-3.1-8B-Omni",
+            "--model-name", "Llama-3.1-8B-Omni",
+            "--s2s"
+        ])
+        time.sleep(10)  # Wait for model worker to start
+        print("Setup complete")
+    def predict(
+        self,
+        audio: Path = Input(description="Audio file for speech input", default=None),
+        text: str = Input(description="Text input (used if no audio is provided)", default=None),
+    ) -> str:
+        """Run inference on the model"""
+        if audio is None and not text:
+            return "Error: Please provide either an audio file or text input."
+        if audio is not None:
+            # Process audio input
+            print(f"Transcribing audio from {audio}...")
+            # Transcribe audio using Whisper
+            result = self.whisper_model.transcribe(str(audio))
+            transcription = result["text"]
+            print(f"Transcription: {transcription}")
+            # In a real implementation, this would process the transcription through LLaMA-Omni
+            # For this placeholder, we'll just return the transcription with a simulated response
+            response = f"Transcription: {transcription}\n\nResponse: This is a simulated response to your audio. In a real deployment, this would be processed through the LLaMA-Omni model."
+            return response
+        else:
+            # Process text input
+            print(f"Processing text: {text}")
+            # In a real implementation, this would process the text through LLaMA-Omni
+            # For this placeholder, we'll just return the text with a simulated response
+            response = f"Input: {text}\n\nResponse: This is a simulated response to your text. In a real deployment, this would be processed through the LLaMA-Omni model."
+            return response
+    def __del__(self):
+        """Clean up processes on shutdown"""
+        if hasattr(self, 'controller_process'):
+            self.controller_process.terminate()
+        if hasattr(self, 'model_worker_process'):
+            self.model_worker_process.terminate()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,30 @@

+[build-system]
+requires = ["setuptools>=42", "wheel"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools]
+packages = ["omni_speech"]
+[project]
+name = "llama-omni"
+version = "0.1.0"
+description = "LLaMA-Omni: Seamless Speech Interaction with Large Language Models"
+authors = [
+    {name = "Qingkai Fang", email = "[email protected]"},
+]
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "torch>=2.0.0",
+    "transformers>=4.34.0",
+    "accelerate>=0.21.0",
+    "gradio>=3.50.2",
+    "fastapi>=0.104.0",
+    "uvicorn>=0.23.2",
+    "pydantic>=2.3.0",
+    "whisper>=0.0.1",
+    "numpy>=1.24.0",
+    "tqdm>=4.66.1",
+    "flash-attn>=2.3.0",
+    "fairseq>=0.12.2",
+]

requirements.txt CHANGED Viewed

@@ -1,15 +1,13 @@
 torch>=2.0.0
-torchaudio>=2.0.0
-transformers>=4.30.0
-tokenizers>=0.13.0
-gradio>=3.30.0
-huggingface-hub>=0.16.0
-safetensors>=0.3.1
 numpy>=1.24.0
-einops>=0.6.0
-diffusers>=0.18.0
-accelerate>=0.20.0
-soundfile>=0.12.1
-librosa>=0.10.0
-pydub
-ffmpeg-python

 torch>=2.0.0
+transformers>=4.34.0
+accelerate>=0.21.0
+gradio>=3.50.2
+fastapi>=0.104.0
+uvicorn>=0.23.2
+pydantic>=2.3.0
+openai-whisper>=0.0.1
 numpy>=1.24.0
+tqdm>=4.66.1
+git+https://github.com/pytorch/fairseq.git
+flash-attn>=2.3.0
+requests>=2.31.0

run_without_downloads.sh DELETED Viewed

@@ -1,55 +0,0 @@
-#!/bin/bash
-# Script para executar o LLaMA-Omni2 sem baixar modelos localmente
-# Definir a variável de ambiente NO_DOWNLOAD
-export NO_DOWNLOAD=1
-# Verificar se a variável foi definida
-echo "Verificando variável de ambiente NO_DOWNLOAD..."
-echo "NO_DOWNLOAD=$NO_DOWNLOAD"
-# Adicionar modo de depuração para verificar o funcionamento
-export PYTHONVERBOSE=1
-export PYTHONPATH=$(pwd):$PYTHONPATH
-# Criar arquivo temporário de verificação
-python -c "
-import os
-with open('env_check.txt', 'w') as f:
-    f.write(f'NO_DOWNLOAD={os.environ.get(\"NO_DOWNLOAD\", \"não definido\")}')
-"
-# Mostrar o conteúdo do arquivo de verificação
-echo "Conteúdo do arquivo de verificação:"
-cat env_check.txt
-# Executar a aplicação
-echo "Executando LLaMA-Omni2 no modo sem download (NO_DOWNLOAD=1)"
-echo "Os modelos serão usados diretamente do Hugging Face Hub, sem baixar localmente"
-echo "======================================================================"
-# Verificar qual aplicação iniciar
-if [ "$1" == "app" ] || [ "$1" == "" ]; then
-    echo "Iniciando app.py..."
-    # Verificar se a variável está disponível para o Python
-    python -c "import os; print('NO_DOWNLOAD environment variable:', os.environ.get('NO_DOWNLOAD', 'not set'))"
-    # Executar com a variável de ambiente explícita
-    NO_DOWNLOAD=1 python app.py
-elif [ "$1" == "launcher" ]; then
-    echo "Iniciando launcher..."
-    python -c "import os; print('NO_DOWNLOAD environment variable:', os.environ.get('NO_DOWNLOAD', 'not set'))"
-    # Usar a opção de linha de comando
-    NO_DOWNLOAD=1 python launch_llama_omni2.py --no-model-download
-elif [ "$1" == "audio" ]; then
-    echo "Iniciando interface de áudio..."
-    python -c "import os; print('NO_DOWNLOAD environment variable:', os.environ.get('NO_DOWNLOAD', 'not set'))"
-    NO_DOWNLOAD=1 python audio_interface.py
-else
-    echo "Uso: $0 [app|launcher|audio]"
-    echo "  app      - Inicia app.py (padrão)"
-    echo "  launcher - Inicia launch_llama_omni2.py"
-    echo "  audio    - Inicia audio_interface.py"
-fi
-# Limpar arquivo temporário
-rm -f env_check.txt

tests/README.md DELETED Viewed

@@ -1,116 +0,0 @@
-# Teste LLaMA-Omni2-0.5B no Hugging Face
-Este diretório contém um script completo para testar o modelo LLaMA-Omni2-0.5B implantado no Hugging Face.
-## Características do Script
-- Teste da API programaticamente (modo api)
-- Interface de teste manual no navegador (modo manual)
-- Transcrição local de áudio com Whisper
-- Envio de texto diretamente para o modelo
-- Salvamento da transcrição e das respostas para referência
-## Pré-requisitos
-Antes de executar o script de teste, certifique-se de ter instalado as dependências necessárias:
-```bash
-pip install requests gradio-client
-```
-Para transcrição de áudio (opcional), você pode instalar o Whisper:
-```bash
-pip install openai-whisper
-```
-## Uso
-Você pode executar o script de teste usando o seguinte comando:
-```bash
-cd tests
-python test_llama_omni_api.py
-```
-Por padrão, o script executará ambos os modos (api e manual) e irá:
-1. Tentar transcrever o arquivo test.mp3 usando Whisper (se disponível)
-2. Se o Whisper não estiver disponível ou o arquivo não existir, usará uma mensagem de teste padrão
-3. Testar a API programaticamente e salvar a resposta
-4. Salvar o texto de entrada em um arquivo para fácil cópia
-5. Abrir a interface web do LLaMA-Omni2-0.5B no Hugging Face no seu navegador
-6. Fornecer instruções para teste manual
-### Parâmetros de linha de comando
-O script aceita os seguintes argumentos de linha de comando:
-- `--api-url`: URL da interface Gradio (padrão: https://marcosremar2-llama-omni.hf.space)
-- `--audio-file`: Caminho para o arquivo de áudio a ser transcrito localmente (padrão: test.mp3)
-- `--text`: Texto para usar diretamente (em vez de transcrever áudio)
-- `--output-dir`: Diretório para salvar a transcrição e respostas (padrão: ./output)
-- `--mode`: Modo de teste: api (programático), manual (navegador) ou both (ambos) (padrão: both)
-### Exemplos de uso com parâmetros personalizados:
-```bash
-# Usando entrada de texto direta
-python test_llama_omni_api.py --text "Olá, esta é uma mensagem de teste para o LLaMA-Omni2-0.5B."
-# Usando um arquivo de áudio personalizado para transcrição
-python test_llama_omni_api.py --audio-file /caminho/para/seu/audio.mp3
-# Testando apenas o modo API programaticamente
-python test_llama_omni_api.py --mode api
-# Apenas abrindo a interface web com um texto personalizado
-python test_llama_omni_api.py --mode manual --text "Teste manual do LLaMA-Omni2-0.5B"
-```
-## Modos de Teste
-### 1. Modo API (Programático)
-Envia diretamente uma solicitação para a API do modelo e salva a resposta em um arquivo:
-- Conecta-se à API do Gradio com timeout aumentado
-- Lista os endpoints disponíveis
-- Envia o texto para o endpoint de geração
-- Salva a resposta recebida em um arquivo
-- Também consulta informações básicas do modelo
-### 2. Modo Manual (Interface Web)
-Facilita o teste manual com o seguinte fluxo de trabalho:
-1. **Preparação do Texto**: O texto de entrada é salvo em um arquivo para fácil cópia
-2. **Abertura do Navegador**: O script abre a interface web no seu navegador padrão
-3. **Interação Manual**: Você precisa manualmente:
-   - Copiar o texto do arquivo salvo
-   - Colar no campo "Input Text" na interface web
-   - Clicar no botão "Generate"
-   - Aguardar a resposta
-   - Copiar e salvar a resposta para seus registros
-## Solução de Problemas
-Se encontrar algum problema:
-1. Verifique se a URL da interface web está correta e o serviço está em execução
-2. Certifique-se de ter uma conexão com a internet
-3. Se estiver usando transcrição de áudio, certifique-se de que o Whisper esteja instalado corretamente
-4. No modo API, verifique se o Gradio Space está ativo (às vezes eles "dormem" quando inativos)
-## Erros Comuns
-### Dependências Ausentes
-Se você ver erros relacionados a módulos não encontrados, instale as dependências necessárias:
-```bash
-pip install requests gradio-client openai-whisper
-```
-### Deploy no Hugging Face
-Este script é apenas para teste do modelo LLaMA-Omni2-0.5B já implantado no Hugging Face. Para fazer o deploy do modelo no Hugging Face Spaces, você só precisa fazer push do seu código para o repositório correspondente no Hugging Face.

tests/test.mp3 DELETED Viewed

Binary file (13.5 kB)

tests/test_llama_omni_api.py DELETED Viewed

@@ -1,223 +0,0 @@
-#!/usr/bin/env python3
-"""
-Teste completo para o LLaMA-Omni2-0.5B no Hugging Face
-Este script pode:
-1. Transcrever áudio localmente e enviar para o modelo
-2. Enviar texto diretamente para o modelo
-3. Facilita o teste manual com interface web
-4. Testar a API diretamente de modo programático
-"""
-import os
-import sys
-import time
-import argparse
-import requests
-import subprocess
-import webbrowser
-from pathlib import Path
-from gradio_client import Client
-# Configurações padrão
-DEFAULT_API_URL = "https://marcosremar2-llama-omni.hf.space"
-DEFAULT_OUTPUT_DIR = "./output"
-MODEL_NAME = "LLaMA-Omni2-0.5B"
-def transcribe_audio_locally(audio_file_path):
-    """
-    Transcreve áudio localmente usando whisper se disponível
-    Caso contrário, retorna uma mensagem padrão
-    """
-    try:
-        # Tenta usar whisper CLI se disponível
-        result = subprocess.run(
-            ["whisper", audio_file_path, "--model", "tiny", "--output_format", "txt"],
-            capture_output=True,
-            text=True,
-            check=True
-        )
-        transcript_file = f"{os.path.splitext(audio_file_path)[0]}.txt"
-        if os.path.exists(transcript_file):
-            with open(transcript_file, "r") as f:
-                transcript = f.read().strip()
-            print(f"Transcrição: {transcript}")
-            return transcript
-    except (subprocess.CalledProcessError, FileNotFoundError) as e:
-        print(f"Whisper não disponível ou erro: {e}")
-    # Mensagem padrão
-    print("Usando mensagem de teste padrão, já que whisper não está disponível")
-    return f"Olá, estou testando o modelo {MODEL_NAME}. Você pode me responder em português?"
-def check_url_accessibility(url):
-    """Verifica se a URL é acessível"""
-    try:
-        response = requests.get(url, timeout=10)
-        if response.status_code == 200:
-            return True
-        else:
-            print(f"URL retornou código {response.status_code}")
-            return False
-    except Exception as e:
-        print(f"Erro ao acessar URL: {e}")
-        return False
-def save_text_to_file(text, output_dir, filename="text.txt"):
-    """Salva texto em arquivo para fácil cópia"""
-    os.makedirs(output_dir, exist_ok=True)
-    filepath = os.path.join(output_dir, filename)
-    with open(filepath, "w") as f:
-        f.write(text)
-    print(f"Texto salvo em: {filepath}")
-    return filepath
-def test_api_programmatically(api_url, text_input, output_dir=DEFAULT_OUTPUT_DIR):
-    """
-    Testa a API do modelo programaticamente enviando um texto
-    e salvando a resposta
-    """
-    output_path = os.path.join(output_dir, f"response_{int(time.time())}.txt")
-    os.makedirs(output_dir, exist_ok=True)
-    print(f"Testando API em: {api_url}")
-    print(f"Texto de entrada: {text_input[:50]}..." if len(text_input) > 50 else f"Texto de entrada: {text_input}")
-    try:
-        # Conecta ao app Gradio com timeout aumentado
-        client = Client(
-            api_url,
-            httpx_kwargs={"timeout": 300.0}  # 5 minutos de timeout
-        )
-        print("Conectado à API com sucesso")
-        # Lista os endpoints disponíveis
-        print("Endpoints disponíveis:")
-        client.view_api()
-        # Envia o prompt para o modelo
-        print(f"\nUsando endpoint de geração de texto (/lambda_1)...")
-        print(f"Enviando prompt: '{text_input[:50]}...'")
-        job = client.submit(
-            text_input,
-            MODEL_NAME,
-            api_name="/lambda_1"
-        )
-        print("Requisição enviada, aguardando resposta...")
-        result = job.result()
-        print(f"Resposta recebida (tamanho: {len(str(result))} caracteres)")
-        # Salva a resposta em arquivo
-        with open(output_path, "w") as f:
-            f.write(str(result))
-        print(f"Resposta salva em: {output_path}")
-        # Tenta obter informações do modelo
-        try:
-            print("\nConsultando informações do modelo...")
-            model_info = client.submit(api_name="/lambda").result()
-            print(f"Informações do modelo: {model_info}")
-        except Exception as model_error:
-            print(f"Erro ao obter informações do modelo: {str(model_error)}")
-        return True, result
-    except Exception as e:
-        print(f"Erro durante requisição à API: {str(e)}")
-        print("Isso pode ocorrer porque o Space está dormindo e precisa de tempo para iniciar.")
-        print("Tente acessar o Space diretamente primeiro: " + api_url)
-        print(f"\nNota: Esta API é para o modelo {MODEL_NAME} e não processa áudio diretamente.")
-        print("Para trabalhar com áudio, você precisaria primeiro transcrever o áudio usando Whisper,")
-        print("e então enviar o texto transcrito para esta API.")
-        return False, None
-def test_manual_interface(api_url, text_input, output_dir=DEFAULT_OUTPUT_DIR):
-    """
-    Prepara o teste manual do modelo via interface web:
-    1. Salva o texto em arquivo para fácil cópia
-    2. Abre a interface web para teste manual
-    """
-    # Verifica se a URL é acessível
-    print(f"Verificando acessibilidade de {api_url}...")
-    if not check_url_accessibility(api_url):
-        print(f"Aviso: {api_url} não está acessível. Teste manual pode não ser possível.")
-    # Salva o texto em arquivo para fácil cópia
-    transcript_file = save_text_to_file(text_input, output_dir, "transcription.txt")
-    # Instruções para teste manual
-    print("\n" + "=" * 50)
-    print(f"INSTRUÇÕES PARA TESTE MANUAL DO {MODEL_NAME}")
-    print("=" * 50)
-    print(f"1. O texto foi salvo em: {transcript_file}")
-    print(f"2. Abrindo {api_url} no navegador...")
-    print("3. Copie o texto do arquivo salvo e cole no campo 'Input Text'")
-    print("4. Clique no botão 'Generate'")
-    print("5. Quando receber a resposta, copie e salve para seus registros")
-    print("=" * 50 + "\n")
-    # Abre a URL no navegador padrão
-    try:
-        webbrowser.open(api_url)
-        return True
-    except Exception as e:
-        print(f"Erro ao abrir navegador: {e}")
-        print(f"Por favor, visite manualmente: {api_url}")
-        return False
-def main():
-    parser = argparse.ArgumentParser(description=f"Teste para {MODEL_NAME} no Hugging Face")
-    parser.add_argument("--api-url", type=str, default=DEFAULT_API_URL,
-                        help=f"URL da interface Gradio (padrão: {DEFAULT_API_URL})")
-    parser.add_argument("--audio-file", type=str, default="test.mp3",
-                        help="Caminho para o arquivo de áudio a ser transcrito localmente (opcional)")
-    parser.add_argument("--text", type=str, default=None,
-                        help="Texto para usar diretamente (em vez de transcrever áudio)")
-    parser.add_argument("--output-dir", type=str, default=DEFAULT_OUTPUT_DIR,
-                        help="Diretório para salvar a transcrição e respostas")
-    parser.add_argument("--mode", type=str, choices=["api", "manual", "both"], default="both",
-                        help="Modo de teste: api (programático), manual (navegador) ou both (ambos)")
-    args = parser.parse_args()
-    # Converte caminhos relativos para absolutos
-    if args.audio_file and not os.path.isabs(args.audio_file):
-        if not os.path.exists(args.audio_file):
-            script_dir = os.path.dirname(os.path.abspath(__file__))
-            args.audio_file = os.path.join(script_dir, args.audio_file)
-    if args.output_dir and not os.path.isabs(args.output_dir):
-        script_dir = os.path.dirname(os.path.abspath(__file__))
-        args.output_dir = os.path.join(script_dir, args.output_dir)
-    # Obtém texto de entrada da transcrição ou do parâmetro
-    input_text = args.text
-    if not input_text and args.audio_file:
-        if os.path.exists(args.audio_file):
-            input_text = transcribe_audio_locally(args.audio_file)
-        else:
-            print(f"Arquivo de áudio não encontrado: {args.audio_file}")
-            input_text = f"Olá, estou testando o modelo {MODEL_NAME}. Você pode me responder em português?"
-    if not input_text:
-        input_text = f"Olá, estou testando o modelo {MODEL_NAME}. Você pode me responder em português?"
-    print(f"Texto de entrada: {input_text}")
-    # Executa os testes conforme o modo selecionado
-    success = True
-    if args.mode in ["api", "both"]:
-        api_success, _ = test_api_programmatically(args.api_url, input_text, args.output_dir)
-        success = success and api_success
-    if args.mode in ["manual", "both"]:
-        manual_success = test_manual_interface(args.api_url, input_text, args.output_dir)
-        success = success and manual_success
-    # Sai com código apropriado
-    sys.exit(0 if success else 1)
-if __name__ == "__main__":
-    main()