Prathamesh Sarjerao Vaidya commited on
Commit
321254f
Β·
1 Parent(s): da625ea

fix docker write error

Browse files
DOCUMENTATION.md CHANGED
@@ -1,12 +1,12 @@
1
- # Enhanced Multilingual Audio Intelligence System - Technical Documentation
2
 
3
  ## 1. Project Overview
4
 
5
- The Enhanced Multilingual Audio Intelligence System is an AI-powered platform that combines speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This system processes multilingual audio content with support for Indian languages, identifies individual speakers, transcribes speech with high accuracy, and provides translations across 100+ languages through a multi-tier fallback system, transforming raw audio into structured, actionable insights.
6
 
7
  ## 2. Objective
8
 
9
- The primary objective of the Enhanced Multilingual Audio Intelligence System is to provide comprehensive audio content analysis capabilities by:
10
 
11
  - **Language Support**: Support for Tamil, Hindi, Telugu, Gujarati, Kannada, and other regional languages
12
  - **Multi-Tier Translation**: Fallback system ensuring broad translation coverage across language pairs
@@ -180,7 +180,7 @@ The application includes a demo mode for testing without waiting for full model
180
  - Available demos:
181
  - [Yuri_Kizaki.mp3](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3) β€” Japanese narration about website communication
182
  - [Film_Podcast.mp3](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3) β€” French podcast discussing films like The Social Network
183
- - [Tamil_Wikipedia_Interview.ogg](https://commons.wikimedia.org/wiki/File:Tamil_Wikipedia_Interview.ogg) β€” Tamil language interview (36+ minutes)
184
  - [Car_Trouble.mp3](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3) β€” Conversation about waiting for a mechanic and basic assistance (2:45)
185
  - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
186
  - The UI provides enhanced selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.
 
1
+ # Multilingual Audio Intelligence System - Technical Documentation
2
 
3
  ## 1. Project Overview
4
 
5
+ The Multilingual Audio Intelligence System is an AI-powered platform that combines speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This system processes multilingual audio content with support for Indian languages, identifies individual speakers, transcribes speech with high accuracy, and provides translations across 100+ languages through a multi-tier fallback system, transforming raw audio into structured, actionable insights.
6
 
7
  ## 2. Objective
8
 
9
+ The primary objective of the Multilingual Audio Intelligence System is to provide comprehensive audio content analysis capabilities by:
10
 
11
  - **Language Support**: Support for Tamil, Hindi, Telugu, Gujarati, Kannada, and other regional languages
12
  - **Multi-Tier Translation**: Fallback system ensuring broad translation coverage across language pairs
 
180
  - Available demos:
181
  - [Yuri_Kizaki.mp3](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3) β€” Japanese narration about website communication
182
  - [Film_Podcast.mp3](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3) β€” French podcast discussing films like The Social Network
183
+ - [Tamil_Wikipedia_Interview.ogg](https://commons.wikimedia.org/wiki/File:Parvathisri-Wikipedia-Interview-Vanavil-fm.ogg) β€” Tamil language interview (36+ minutes)
184
  - [Car_Trouble.mp3](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3) β€” Conversation about waiting for a mechanic and basic assistance (2:45)
185
  - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
186
  - The UI provides enhanced selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.
Dockerfile CHANGED
@@ -24,8 +24,12 @@ RUN pip install --no-cache-dir --upgrade pip && \
24
  COPY . .
25
 
26
  # Create necessary directories with proper permissions
 
27
  RUN mkdir -p templates static uploads outputs model_cache temp_files demo_results demo_audio \
28
- && chmod -R 755 templates static uploads outputs model_cache temp_files demo_results demo_audio
 
 
 
29
 
30
  # Set environment variables for Hugging Face Spaces
31
  ENV PYTHONPATH=/app \
@@ -44,7 +48,8 @@ ENV PYTHONPATH=/app \
44
  PYANNOTE_CACHE=/app/model_cache \
45
  MPLCONFIGDIR=/tmp/matplotlib \
46
  HUGGINGFACE_HUB_CACHE=/app/model_cache \
47
- HF_HUB_CACHE=/app/model_cache
 
48
 
49
  # Expose port for Hugging Face Spaces
50
  EXPOSE 7860
@@ -54,4 +59,16 @@ HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
54
  CMD curl -f http://localhost:7860/api/system-info || exit 1
55
 
56
  # Preload models and start the application
57
- CMD ["python", "-c", "import subprocess; import time; print('πŸš€ Starting Enhanced Multilingual Audio Intelligence System...'); subprocess.run(['python', 'model_preloader.py']); print('βœ… Models loaded successfully'); import uvicorn; uvicorn.run('web_app:app', host='0.0.0.0', port=7860, workers=1, log_level='info')"]
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  COPY . .
25
 
26
  # Create necessary directories with proper permissions
27
+ # Fixed: Use 777 permissions for directories that need write access
28
  RUN mkdir -p templates static uploads outputs model_cache temp_files demo_results demo_audio \
29
+ /tmp/matplotlib /tmp/fontconfig \
30
+ && chmod -R 777 templates static \
31
+ && chmod -R 777 uploads outputs model_cache temp_files demo_results demo_audio \
32
+ && chmod -R 777 /tmp/matplotlib /tmp/fontconfig
33
 
34
  # Set environment variables for Hugging Face Spaces
35
  ENV PYTHONPATH=/app \
 
48
  PYANNOTE_CACHE=/app/model_cache \
49
  MPLCONFIGDIR=/tmp/matplotlib \
50
  HUGGINGFACE_HUB_CACHE=/app/model_cache \
51
+ HF_HUB_CACHE=/app/model_cache \
52
+ FONTCONFIG_PATH=/tmp/fontconfig
53
 
54
  # Expose port for Hugging Face Spaces
55
  EXPOSE 7860
 
59
  CMD curl -f http://localhost:7860/api/system-info || exit 1
60
 
61
  # Preload models and start the application
62
+ # Fixed: Ensure directories exist with proper permissions at runtime
63
+ CMD ["python", "-c", "\
64
+ import os; \
65
+ import subprocess; \
66
+ import time; \
67
+ print('πŸš€ Starting Multilingual Audio Intelligence System...'); \
68
+ for dir in ['uploads', 'outputs', 'model_cache', 'temp_files', 'demo_results', '/tmp/matplotlib', '/tmp/fontconfig']: \
69
+ os.makedirs(dir, mode=0o777, exist_ok=True); \
70
+ subprocess.run(['python', 'model_preloader.py']); \
71
+ print('βœ… Models loaded successfully'); \
72
+ import uvicorn; \
73
+ uvicorn.run('web_app:app', host='0.0.0.0', port=7860, workers=1, log_level='info')\
74
+ "]
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Enhanced Multilingual Audio Intelligence System
3
  emoji: 🎡
4
  colorFrom: blue
5
  colorTo: purple
@@ -8,7 +8,7 @@ pinned: false
8
  short_description: AI for multilingual transcription & Indian language support
9
  ---
10
 
11
- # 🎡 Enhanced Multilingual Audio Intelligence System
12
 
13
  <p align="center">
14
  <img src="static/imgs/banner.png" alt="Multilingual Audio Intelligence System Banner" style="border: 1px solid black"/>
@@ -48,10 +48,10 @@ This AI-powered platform combines speaker diarization, automatic speech recognit
48
 
49
  The system includes sample audio files for testing and demonstration:
50
 
51
- - **Japanese Business Audio**: Professional voice message about website communication
52
- - **French Film Podcast**: Discussion about movies including Social Network and Paranormal Activity
53
- - **Tamil Wikipedia Interview**: Tamil language interview on collaborative knowledge sharing (36+ minutes)
54
- - **Hindi Car Trouble**: Hindi conversation about daily life scenarios (2:45)
55
 
56
  ### Demo Features
57
  - **Pre-processed Results**: Cached processing for quick demonstration
@@ -111,7 +111,7 @@ The system includes sample audio files for testing and demonstration:
111
  ### **1. Environment Setup**
112
  ```bash
113
  # Clone the enhanced repository
114
- git clone https://github.com/YourUsername/Enhanced-Multilingual-Audio-Intelligence-System.git
115
  cd Enhanced-Multilingual-Audio-Intelligence-System
116
 
117
  # Create conda environment (recommended)
@@ -153,34 +153,34 @@ python run_app.py --mode test # System testing
153
 
154
  ```
155
  Enhanced-Multilingual-Audio-Intelligence-System/
156
- β”œβ”€β”€ run_app.py # πŸ†• Single entry point for all modes
157
- β”œβ”€β”€ web_app.py # Enhanced FastAPI application
158
- β”œβ”€β”€ src/ # πŸ†• Organized source modules
159
- β”‚ β”œβ”€β”€ main.py # Enhanced pipeline orchestrator
160
- β”‚ β”œβ”€β”€ audio_processor.py # Enhanced with smart file management
161
- β”‚ β”œβ”€β”€ speaker_diarizer.py # pyannote.audio integration
162
- β”‚ β”œβ”€β”€ speech_recognizer.py # faster-whisper integration
163
- β”‚ β”œβ”€β”€ translator.py # πŸ†• 3-tier hybrid translation system
164
- β”‚ β”œβ”€β”€ output_formatter.py # Multi-format output generation
165
- β”‚ β”œβ”€β”€ demo_manager.py # Enhanced demo file management
166
- β”‚ β”œβ”€β”€ ui_components.py # Interactive UI components
167
- β”‚ └── utils.py # Enhanced utility functions
168
- β”œβ”€β”€ demo_audio/ # Enhanced demo files
169
- β”‚ β”œβ”€β”€ Yuri_Kizaki.mp3 # Japanese business communication
170
- β”‚ β”œβ”€β”€ Film_Podcast.mp3 # French cinema discussion
171
- β”‚ β”œβ”€β”€ Tamil_Wikipedia_Interview.ogg # πŸ†• Tamil language interview
172
- β”‚ └── Car_Trouble.mp3 # πŸ†• Hindi daily conversation
173
  β”œβ”€β”€ templates/
174
- β”‚ └── index.html # Enhanced UI with Indian language support
175
  β”œβ”€β”€ static/
176
- β”‚ └── imgs/ # Enhanced screenshots and assets
177
- β”œβ”€β”€ model_cache/ # Intelligent model caching
178
- β”œβ”€β”€ outputs/ # Processing results
179
- β”œβ”€β”€ requirements.txt # Enhanced dependencies
180
- β”œβ”€β”€ README.md # This enhanced documentation
181
- β”œβ”€β”€ DOCUMENTATION.md # πŸ†• Comprehensive technical docs
182
- β”œβ”€β”€ TECHNICAL_UNDERSTANDING.md # πŸ†• System architecture guide
183
- └── files_which_are_not_needed/ # πŸ†• Archived legacy files
184
  ```
185
 
186
  ## 🌟 Enhanced Usage Examples
@@ -246,23 +246,6 @@ MAX_FILE_SIZE_MB=200 # Smart file size limit
246
  - **Device Selection**: CPU (recommended), CUDA (if available)
247
  - **Cache Management**: Automatic model caching and cleanup
248
 
249
- ## Problem Statement 6 Alignment
250
-
251
- This system addresses **PS-6: "Language-Agnostic Speaker Identification/Verification & Diarization; and subsequent Transcription & Translation System"** with the following capabilities:
252
-
253
- ### **Current Implementation (70% Coverage)**
254
- - βœ… **Speaker Diarization**: pyannote.audio for "who spoke when" analysis
255
- - βœ… **Multilingual ASR**: faster-whisper with automatic language detection
256
- - βœ… **Neural Translation**: Multi-tier system for 100+ languages
257
- - βœ… **Audio Format Support**: WAV, MP3, OGG, FLAC, M4A
258
- - βœ… **User Interface**: Transcripts, visualizations, and translations
259
-
260
- ### **Enhanced Features (95% Complete)**
261
- - βœ… **Advanced Speaker Verification**: Multi-model speaker identification with SpeechBrain, Wav2Vec2, and enhanced feature extraction
262
- - βœ… **Advanced Noise Reduction**: ML-based enhancement with Sepformer, Demucs, and advanced signal processing
263
- - βœ… **Enhanced Code-switching**: Improved support for mixed language audio with context awareness
264
- - βœ… **Performance Optimization**: Real-time processing with advanced caching and optimization
265
-
266
  ## System Advantages
267
 
268
  ### **Reliability**
@@ -335,7 +318,7 @@ docker run -p 8000:7860 audio-intelligence
335
  ### **Hugging Face Spaces**
336
  ```yaml
337
  # spaces.yaml
338
- title: Enhanced Multilingual Audio Intelligence System
339
  emoji: 🎡
340
  colorFrom: blue
341
  colorTo: purple
@@ -368,4 +351,4 @@ This enhanced system is released under MIT License - see the [LICENSE](LICENSE)
368
 
369
  ---
370
 
371
- **A comprehensive solution for multilingual audio analysis and translation, designed to handle diverse language requirements and processing scenarios.**
 
1
  ---
2
+ title: Multilingual Audio Intelligence System
3
  emoji: 🎡
4
  colorFrom: blue
5
  colorTo: purple
 
8
  short_description: AI for multilingual transcription & Indian language support
9
  ---
10
 
11
+ # 🎡 Multilingual Audio Intelligence System
12
 
13
  <p align="center">
14
  <img src="static/imgs/banner.png" alt="Multilingual Audio Intelligence System Banner" style="border: 1px solid black"/>
 
48
 
49
  The system includes sample audio files for testing and demonstration:
50
 
51
+ - [Japanese Business Audio](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3): Professional voice message about website communication
52
+ - [French Film Podcast](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3): Discussion about movies including Social Network and Paranormal Activity
53
+ - [Tamil Wikipedia Interview](https://commons.wikimedia.org/wiki/File:Parvathisri-Wikipedia-Interview-Vanavil-fm.ogg): Tamil language interview on collaborative knowledge sharing (36+ minutes)
54
+ - [Hindi Car Trouble](https://www.tuttlepublishing.com/content/docs/9780804844383/06-18%20Part2%20Car%20Trouble.mp3): Hindi conversation about daily life scenarios (2:45)
55
 
56
  ### Demo Features
57
  - **Pre-processed Results**: Cached processing for quick demonstration
 
111
  ### **1. Environment Setup**
112
  ```bash
113
  # Clone the enhanced repository
114
+ git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git
115
  cd Enhanced-Multilingual-Audio-Intelligence-System
116
 
117
  # Create conda environment (recommended)
 
153
 
154
  ```
155
  Enhanced-Multilingual-Audio-Intelligence-System/
156
+ β”œβ”€β”€ run_app.py # Single entry point for all modes
157
+ β”œβ”€β”€ web_app.py # Enhanced FastAPI application
158
+ β”œβ”€β”€ src/ # Organized source modules
159
+ β”‚ β”œβ”€β”€ main.py # Enhanced pipeline orchestrator
160
+ β”‚ β”œβ”€β”€ audio_processor.py # Enhanced with smart file management
161
+ β”‚ β”œβ”€β”€ speaker_diarizer.py # pyannote.audio integration
162
+ β”‚ β”œβ”€β”€ speech_recognizer.py # faster-whisper integration
163
+ β”‚ β”œβ”€β”€ translator.py # 3-tier hybrid translation system
164
+ β”‚ β”œβ”€β”€ output_formatter.py # Multi-format output generation
165
+ β”‚ β”œβ”€β”€ demo_manager.py # Enhanced demo file management
166
+ β”‚ β”œβ”€β”€ ui_components.py # Interactive UI components
167
+ β”‚ └── utils.py # Enhanced utility functions
168
+ β”œβ”€β”€ demo_audio/ # Enhanced demo files
169
+ β”‚ β”œβ”€β”€ Yuri_Kizaki.mp3 # Japanese business communication
170
+ β”‚ β”œβ”€β”€ Film_Podcast.mp3 # French cinema discussion
171
+ β”‚ β”œβ”€β”€ Tamil_Wikipedia_Interview.ogg # Tamil language interview
172
+ β”‚ └── Car_Trouble.mp3 # Hindi daily conversation
173
  β”œβ”€β”€ templates/
174
+ β”‚ └── index.html # Enhanced UI with Indian language support
175
  β”œβ”€β”€ static/
176
+ β”‚ └── imgs/ # Enhanced screenshots and assets
177
+ β”œβ”€β”€ model_cache/ # Intelligent model caching
178
+ β”œβ”€β”€ outputs/ # Processing results
179
+ β”œβ”€β”€ requirements.txt # Enhanced dependencies
180
+ β”œβ”€β”€ README.md # This enhanced documentation
181
+ β”œβ”€β”€ DOCUMENTATION.md # Comprehensive technical docs
182
+ β”œβ”€β”€ TECHNICAL_UNDERSTANDING.md # System architecture guide
183
+ └── files_which_are_not_needed/ # Archived legacy files
184
  ```
185
 
186
  ## 🌟 Enhanced Usage Examples
 
246
  - **Device Selection**: CPU (recommended), CUDA (if available)
247
  - **Cache Management**: Automatic model caching and cleanup
248
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
  ## System Advantages
250
 
251
  ### **Reliability**
 
318
  ### **Hugging Face Spaces**
319
  ```yaml
320
  # spaces.yaml
321
+ title: Multilingual Audio Intelligence System
322
  emoji: 🎡
323
  colorFrom: blue
324
  colorTo: purple
 
351
 
352
  ---
353
 
354
+ **A comprehensive solution for multilingual audio analysis and translation, designed to handle diverse language requirements and processing scenarios.**
TECHNICAL_UNDERSTANDING.md CHANGED
@@ -1,8 +1,8 @@
1
- # Technical Understanding - Enhanced Multilingual Audio Intelligence System
2
 
3
  ## Architecture Overview
4
 
5
- This document provides technical insights into the enhanced multilingual audio intelligence system, designed to address comprehensive audio analysis requirements. The system incorporates **Indian language support**, **multi-tier translation**, **waveform visualization**, and **optimized performance** for various deployment scenarios.
6
 
7
  ## System Architecture
8
 
 
1
+ # Technical Understanding - Multilingual Audio Intelligence System
2
 
3
  ## Architecture Overview
4
 
5
+ This document provides technical insights into the multilingual audio intelligence system, designed to address comprehensive audio analysis requirements. The system incorporates **Indian language support**, **multi-tier translation**, **waveform visualization**, and **optimized performance** for various deployment scenarios.
6
 
7
  ## System Architecture
8
 
spaces.yaml CHANGED
@@ -1,4 +1,4 @@
1
- title: Enhanced Multilingual Audio Intelligence System
2
  emoji: 🎡
3
  colorFrom: blue
4
  colorTo: purple
 
1
+ title: Multilingual Audio Intelligence System
2
  emoji: 🎡
3
  colorFrom: blue
4
  colorTo: purple
static/imgs/banner.png CHANGED

Git LFS Details

  • SHA256: 82d55557be2da7a05d864bf4403ec7cba10d5ef1326feb0eba57d4c2d9be02d7
  • Pointer size: 130 Bytes
  • Size of remote file: 89 kB

Git LFS Details

  • SHA256: 9a5ed1a0acb8fc7a8174cb0dfbd8df2849380be7d7754d726b548816438ecdd1
  • Pointer size: 130 Bytes
  • Size of remote file: 89 kB