Spaces:

prathameshv07
/

Multilingual-Audio-Intelligence-System

Sleeping

App Files Files Community

Prathamesh Sarjerao Vaidya commited on 12 days ago

Commit

321254f

1 Parent(s): da625ea

fix docker write error

Browse files

Files changed (6) hide show

DOCUMENTATION.md +4 -4
Dockerfile +20 -3
README.md +35 -52
TECHNICAL_UNDERSTANDING.md +2 -2
spaces.yaml +1 -1
static/imgs/banner.png +2 -2

DOCUMENTATION.md CHANGED Viewed

@@ -1,12 +1,12 @@
-# Enhanced Multilingual Audio Intelligence System - Technical Documentation
 ## 1. Project Overview
-The Enhanced Multilingual Audio Intelligence System is an AI-powered platform that combines speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This system processes multilingual audio content with support for Indian languages, identifies individual speakers, transcribes speech with high accuracy, and provides translations across 100+ languages through a multi-tier fallback system, transforming raw audio into structured, actionable insights.
 ## 2. Objective
-The primary objective of the Enhanced Multilingual Audio Intelligence System is to provide comprehensive audio content analysis capabilities by:
 - **Language Support**: Support for Tamil, Hindi, Telugu, Gujarati, Kannada, and other regional languages
 - **Multi-Tier Translation**: Fallback system ensuring broad translation coverage across language pairs
@@ -180,7 +180,7 @@ The application includes a demo mode for testing without waiting for full model
 - Available demos:
   - [Yuri_Kizaki.mp3](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3) — Japanese narration about website communication
   - [Film_Podcast.mp3](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3) — French podcast discussing films like The Social Network
-  - [Tamil_Wikipedia_Interview.ogg](https://commons.wikimedia.org/wiki/File:Tamil_Wikipedia_Interview.ogg) — Tamil language interview (36+ minutes)
   - [Car_Trouble.mp3](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3) — Conversation about waiting for a mechanic and basic assistance (2:45)
 - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
 - The UI provides enhanced selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.

+# Multilingual Audio Intelligence System - Technical Documentation
 ## 1. Project Overview
+The Multilingual Audio Intelligence System is an AI-powered platform that combines speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This system processes multilingual audio content with support for Indian languages, identifies individual speakers, transcribes speech with high accuracy, and provides translations across 100+ languages through a multi-tier fallback system, transforming raw audio into structured, actionable insights.
 ## 2. Objective
+The primary objective of the Multilingual Audio Intelligence System is to provide comprehensive audio content analysis capabilities by:
 - **Language Support**: Support for Tamil, Hindi, Telugu, Gujarati, Kannada, and other regional languages
 - **Multi-Tier Translation**: Fallback system ensuring broad translation coverage across language pairs
 - Available demos:
   - [Yuri_Kizaki.mp3](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3) — Japanese narration about website communication
   - [Film_Podcast.mp3](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3) — French podcast discussing films like The Social Network
+  - [Tamil_Wikipedia_Interview.ogg](https://commons.wikimedia.org/wiki/File:Parvathisri-Wikipedia-Interview-Vanavil-fm.ogg) — Tamil language interview (36+ minutes)
   - [Car_Trouble.mp3](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3) — Conversation about waiting for a mechanic and basic assistance (2:45)
 - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
 - The UI provides enhanced selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.

Dockerfile CHANGED Viewed

@@ -24,8 +24,12 @@ RUN pip install --no-cache-dir --upgrade pip && \
 COPY . .
 # Create necessary directories with proper permissions
 RUN mkdir -p templates static uploads outputs model_cache temp_files demo_results demo_audio \
-    && chmod -R 755 templates static uploads outputs model_cache temp_files demo_results demo_audio
 # Set environment variables for Hugging Face Spaces
 ENV PYTHONPATH=/app \
@@ -44,7 +48,8 @@ ENV PYTHONPATH=/app \
     PYANNOTE_CACHE=/app/model_cache \
     MPLCONFIGDIR=/tmp/matplotlib \
     HUGGINGFACE_HUB_CACHE=/app/model_cache \
-    HF_HUB_CACHE=/app/model_cache
 # Expose port for Hugging Face Spaces
 EXPOSE 7860
@@ -54,4 +59,16 @@ HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
     CMD curl -f http://localhost:7860/api/system-info || exit 1
 # Preload models and start the application
-CMD ["python", "-c", "import subprocess; import time; print('🚀 Starting Enhanced Multilingual Audio Intelligence System...'); subprocess.run(['python', 'model_preloader.py']); print('✅ Models loaded successfully'); import uvicorn; uvicorn.run('web_app:app', host='0.0.0.0', port=7860, workers=1, log_level='info')"]

 COPY . .
 # Create necessary directories with proper permissions
+# Fixed: Use 777 permissions for directories that need write access
 RUN mkdir -p templates static uploads outputs model_cache temp_files demo_results demo_audio \
+    /tmp/matplotlib /tmp/fontconfig \
+    && chmod -R 777 templates static \
+    && chmod -R 777 uploads outputs model_cache temp_files demo_results demo_audio \
+    && chmod -R 777 /tmp/matplotlib /tmp/fontconfig
 # Set environment variables for Hugging Face Spaces
 ENV PYTHONPATH=/app \
     PYANNOTE_CACHE=/app/model_cache \
     MPLCONFIGDIR=/tmp/matplotlib \
     HUGGINGFACE_HUB_CACHE=/app/model_cache \
+    HF_HUB_CACHE=/app/model_cache \
+    FONTCONFIG_PATH=/tmp/fontconfig
 # Expose port for Hugging Face Spaces
 EXPOSE 7860
     CMD curl -f http://localhost:7860/api/system-info || exit 1
 # Preload models and start the application
+# Fixed: Ensure directories exist with proper permissions at runtime
+CMD ["python", "-c", "\
+import os; \
+import subprocess; \
+import time; \
+print('🚀 Starting Multilingual Audio Intelligence System...'); \
+for dir in ['uploads', 'outputs', 'model_cache', 'temp_files', 'demo_results', '/tmp/matplotlib', '/tmp/fontconfig']: \
+    os.makedirs(dir, mode=0o777, exist_ok=True); \
+subprocess.run(['python', 'model_preloader.py']); \
+print('✅ Models loaded successfully'); \
+import uvicorn; \
+uvicorn.run('web_app:app', host='0.0.0.0', port=7860, workers=1, log_level='info')\
+"]

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Enhanced Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple
@@ -8,7 +8,7 @@ pinned: false
 short_description: AI for multilingual transcription & Indian language support
 ---
-# 🎵 Enhanced Multilingual Audio Intelligence System
 <p align="center">
   <img src="static/imgs/banner.png" alt="Multilingual Audio Intelligence System Banner" style="border: 1px solid black"/>
@@ -48,10 +48,10 @@ This AI-powered platform combines speaker diarization, automatic speech recognit
 The system includes sample audio files for testing and demonstration:
-- **Japanese Business Audio**: Professional voice message about website communication
-- **French Film Podcast**: Discussion about movies including Social Network and Paranormal Activity
-- **Tamil Wikipedia Interview**: Tamil language interview on collaborative knowledge sharing (36+ minutes)
-- **Hindi Car Trouble**: Hindi conversation about daily life scenarios (2:45)
 ### Demo Features
 - **Pre-processed Results**: Cached processing for quick demonstration
@@ -111,7 +111,7 @@ The system includes sample audio files for testing and demonstration:
 ### **1. Environment Setup**
 ```bash
 # Clone the enhanced repository
-git clone https://github.com/YourUsername/Enhanced-Multilingual-Audio-Intelligence-System.git
 cd Enhanced-Multilingual-Audio-Intelligence-System
 # Create conda environment (recommended)
@@ -153,34 +153,34 @@ python run_app.py --mode test    # System testing
 ```
 Enhanced-Multilingual-Audio-Intelligence-System/
-├── run_app.py                      # 🆕 Single entry point for all modes
-├── web_app.py                      # Enhanced FastAPI application
-├── src/                            # 🆕 Organized source modules
-│   ├── main.py                     # Enhanced pipeline orchestrator
-│   ├── audio_processor.py          # Enhanced with smart file management
-│   ├── speaker_diarizer.py         # pyannote.audio integration
-│   ├── speech_recognizer.py        # faster-whisper integration
-│   ├── translator.py               # 🆕 3-tier hybrid translation system
-│   ├── output_formatter.py         # Multi-format output generation
-│   ├── demo_manager.py             # Enhanced demo file management
-│   ├── ui_components.py            # Interactive UI components
-│   └── utils.py                    # Enhanced utility functions
-├── demo_audio/                     # Enhanced demo files
-│   ├── Yuri_Kizaki.mp3            # Japanese business communication
-│   ├── Film_Podcast.mp3            # French cinema discussion
-│   ├── Tamil_Wikipedia_Interview.ogg  # 🆕 Tamil language interview
-│   └── Car_Trouble.mp3             # 🆕 Hindi daily conversation
 ├── templates/
-│   └── index.html                  # Enhanced UI with Indian language support
 ├── static/
-│   └── imgs/                       # Enhanced screenshots and assets
-├── model_cache/                    # Intelligent model caching
-├── outputs/                        # Processing results
-├── requirements.txt                # Enhanced dependencies
-├── README.md                       # This enhanced documentation
-├── DOCUMENTATION.md                # 🆕 Comprehensive technical docs
-├── TECHNICAL_UNDERSTANDING.md      # 🆕 System architecture guide
-└── files_which_are_not_needed/     # 🆕 Archived legacy files
 ```
 ## 🌟 Enhanced Usage Examples
@@ -246,23 +246,6 @@ MAX_FILE_SIZE_MB=200                       # Smart file size limit
 - **Device Selection**: CPU (recommended), CUDA (if available)
 - **Cache Management**: Automatic model caching and cleanup
-## Problem Statement 6 Alignment
-This system addresses **PS-6: "Language-Agnostic Speaker Identification/Verification & Diarization; and subsequent Transcription & Translation System"** with the following capabilities:
-### **Current Implementation (70% Coverage)**
-- ✅ **Speaker Diarization**: pyannote.audio for "who spoke when" analysis
-- ✅ **Multilingual ASR**: faster-whisper with automatic language detection
-- ✅ **Neural Translation**: Multi-tier system for 100+ languages
-- ✅ **Audio Format Support**: WAV, MP3, OGG, FLAC, M4A
-- ✅ **User Interface**: Transcripts, visualizations, and translations
-### **Enhanced Features (95% Complete)**
-- ✅ **Advanced Speaker Verification**: Multi-model speaker identification with SpeechBrain, Wav2Vec2, and enhanced feature extraction
-- ✅ **Advanced Noise Reduction**: ML-based enhancement with Sepformer, Demucs, and advanced signal processing
-- ✅ **Enhanced Code-switching**: Improved support for mixed language audio with context awareness
-- ✅ **Performance Optimization**: Real-time processing with advanced caching and optimization
 ## System Advantages
 ### **Reliability**
@@ -335,7 +318,7 @@ docker run -p 8000:7860 audio-intelligence
 ### **Hugging Face Spaces**
 ```yaml
 # spaces.yaml
-title: Enhanced Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple
@@ -368,4 +351,4 @@ This enhanced system is released under MIT License - see the [LICENSE](LICENSE)
 ---
-**A comprehensive solution for multilingual audio analysis and translation, designed to handle diverse language requirements and processing scenarios.**

 ---
+title: Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple
 short_description: AI for multilingual transcription & Indian language support
 ---
+# 🎵 Multilingual Audio Intelligence System
 <p align="center">
   <img src="static/imgs/banner.png" alt="Multilingual Audio Intelligence System Banner" style="border: 1px solid black"/>
 The system includes sample audio files for testing and demonstration:
+- [Japanese Business Audio](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3): Professional voice message about website communication
+- [French Film Podcast](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3): Discussion about movies including Social Network and Paranormal Activity
+- [Tamil Wikipedia Interview](https://commons.wikimedia.org/wiki/File:Parvathisri-Wikipedia-Interview-Vanavil-fm.ogg): Tamil language interview on collaborative knowledge sharing (36+ minutes)
+- [Hindi Car Trouble](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3): Hindi conversation about daily life scenarios (2:45)
 ### Demo Features
 - **Pre-processed Results**: Cached processing for quick demonstration
 ### **1. Environment Setup**
 ```bash
 # Clone the enhanced repository
+git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git
 cd Enhanced-Multilingual-Audio-Intelligence-System
 # Create conda environment (recommended)
 ```
 Enhanced-Multilingual-Audio-Intelligence-System/
+├── run_app.py                         # Single entry point for all modes
+├── web_app.py                         # Enhanced FastAPI application
+├── src/                               # Organized source modules
+│   ├── main.py                        # Enhanced pipeline orchestrator
+│   ├── audio_processor.py             # Enhanced with smart file management
+│   ├── speaker_diarizer.py            # pyannote.audio integration
+│   ├── speech_recognizer.py           # faster-whisper integration
+│   ├── translator.py                  # 3-tier hybrid translation system
+│   ├── output_formatter.py            # Multi-format output generation
+│   ├── demo_manager.py                # Enhanced demo file management
+│   ├── ui_components.py               # Interactive UI components
+│   └── utils.py                       # Enhanced utility functions
+├── demo_audio/                        # Enhanced demo files
+│   ├── Yuri_Kizaki.mp3                # Japanese business communication
+│   ├── Film_Podcast.mp3               # French cinema discussion
+│   ├── Tamil_Wikipedia_Interview.ogg  # Tamil language interview
+│   └── Car_Trouble.mp3                # Hindi daily conversation
 ├── templates/
+│   └── index.html                     # Enhanced UI with Indian language support
 ├── static/
+│   └── imgs/                          # Enhanced screenshots and assets
+├── model_cache/                       # Intelligent model caching
+├── outputs/                           # Processing results
+├── requirements.txt                   # Enhanced dependencies
+├── README.md                          # This enhanced documentation
+├── DOCUMENTATION.md                   # Comprehensive technical docs
+├── TECHNICAL_UNDERSTANDING.md         # System architecture guide
+└── files_which_are_not_needed/        # Archived legacy files
 ```
 ## 🌟 Enhanced Usage Examples
 - **Device Selection**: CPU (recommended), CUDA (if available)
 - **Cache Management**: Automatic model caching and cleanup
 ## System Advantages
 ### **Reliability**
 ### **Hugging Face Spaces**
 ```yaml
 # spaces.yaml
+title: Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple
 ---
+**A comprehensive solution for multilingual audio analysis and translation, designed to handle diverse language requirements and processing scenarios.**

TECHNICAL_UNDERSTANDING.md CHANGED Viewed

@@ -1,8 +1,8 @@
-# Technical Understanding - Enhanced Multilingual Audio Intelligence System
 ## Architecture Overview
-This document provides technical insights into the enhanced multilingual audio intelligence system, designed to address comprehensive audio analysis requirements. The system incorporates **Indian language support**, **multi-tier translation**, **waveform visualization**, and **optimized performance** for various deployment scenarios.
 ## System Architecture

+# Technical Understanding - Multilingual Audio Intelligence System
 ## Architecture Overview
+This document provides technical insights into the multilingual audio intelligence system, designed to address comprehensive audio analysis requirements. The system incorporates **Indian language support**, **multi-tier translation**, **waveform visualization**, and **optimized performance** for various deployment scenarios.
 ## System Architecture

spaces.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
-title: Enhanced Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple

+title: Multilingual Audio Intelligence System
 emoji: 🎵
 colorFrom: blue
 colorTo: purple

static/imgs/banner.png CHANGED Viewed

Git LFS Details

SHA256: 82d55557be2da7a05d864bf4403ec7cba10d5ef1326feb0eba57d4c2d9be02d7
Pointer size: 130 Bytes
Size of remote file: 89 kB

Git LFS Details

SHA256: 9a5ed1a0acb8fc7a8174cb0dfbd8df2849380be7d7754d726b548816438ecdd1
Pointer size: 130 Bytes
Size of remote file: 89 kB