Spaces:

AndroidGuy
/

Speaker-Diarization

Sleeping

App Files Files Community

Saiyaswanth007 commited on May 27

Commit

4641c1c

1 Parent(s): 9eec0a3

Backend connection

Browse files

Files changed (3) hide show

README.md +64 -0
inference.py +61 -11
ui.py +307 -39

README.md CHANGED Viewed

@@ -9,3 +9,67 @@ license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Real-Time Speaker Diarization
+This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time.
+## Architecture
+The system is split into two components:
+1. **Model Server (Hugging Face Space)**: Runs the speech recognition and speaker diarization models
+2. **Signaling Server (Render)**: Handles WebRTC signaling for direct audio streaming from browser
+## Deployment Instructions
+### Deploy Model Server on Hugging Face Space
+1. Create a new Space on Hugging Face (Docker SDK)
+2. Upload all files from the `Speaker-Diarization` directory
+3. In Space settings:
+   - Set Hardware to CPU (or GPU if available)
+   - Set the public visibility
+   - Environment: Make sure Docker SDK is selected
+### Deploy Signaling Server on Render
+1. Create a new Render Web Service
+2. Connect to your GitHub repo containing the `render-signal` directory
+3. Configure Render service:
+   - Set Build Command: `cd render-signal && pip install -r requirements.txt`
+   - Set Start Command: `cd render-signal && python backend.py`
+   - Select Environment: Python 3
+   - Set Environment Variables:
+     - `HF_SPACE_URL`: Set to your Hugging Face Space URL (e.g., `your-username-speaker-diarization.hf.space`)
+### Update Configuration
+After both services are deployed:
+1. Update `ui.py` on your Hugging Face Space:
+   - Change `RENDER_SIGNALING_URL` to your Render app URL (`wss://your-app.onrender.com/stream`)
+   - Make sure `HF_SPACE_URL` matches your actual Hugging Face Space URL
+2. Update `backend.py` on your Render service:
+   - Set `API_WS` to your Hugging Face Space WebSocket URL (`wss://your-username-speaker-diarization.hf.space/ws_inference`)
+## Usage
+1. Open your Hugging Face Space URL in a web browser
+2. Click "Start Listening" to begin
+3. Speak into your microphone
+4. The system will transcribe your speech and identify different speakers in real-time
+## Technology Stack
+- **Frontend**: Gradio UI with WebRTC for audio streaming
+- **Signaling**: FastRTC on Render for WebRTC signaling
+- **Backend**: FastAPI + WebSockets
+- **Models**:
+  - SpeechBrain ECAPA-TDNN for speaker embeddings
+  - Automatic Speech Recognition for transcription
+## License
+MIT

inference.py CHANGED Viewed

@@ -1,8 +1,10 @@
-from fastapi import FastAPI, WebSocket
 from fastapi.middleware.cors import CORSMiddleware
 from shared import RealtimeSpeakerDiarization
 import uvicorn
 import logging
 # Set up logging
 logging.basicConfig(level=logging.INFO)
@@ -21,33 +23,81 @@ app.add_middleware(
 )
 # Initialize the diarization system
 diart = RealtimeSpeakerDiarization()
 success = diart.initialize_models()
 logger.info(f"Models initialized: {success}")
-diart.start_recording()
 @app.get("/health")
 async def health_check():
-    return {"status": "healthy", "system_running": diart.is_running}
 @app.websocket("/ws_inference")
 async def ws_inference(ws: WebSocket):
     """WebSocket endpoint for real-time audio processing"""
     await ws.accept()
-    logger.info("WebSocket connection established")
     try:
         async for chunk in ws.iter_bytes():
-            # Process audio data
-            diart.process_audio_chunk(chunk, sample_rate=16000)
-            # Send back conversation results
-            result = diart.get_formatted_conversation()
-            await ws.send_text(result)
     except Exception as e:
         logger.error(f"WebSocket error: {e}")
     finally:
-        logger.info("WebSocket connection closed")
 @app.get("/conversation")
 async def get_conversation():

+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.middleware.cors import CORSMiddleware
 from shared import RealtimeSpeakerDiarization
+import numpy as np
 import uvicorn
 import logging
+import asyncio
 # Set up logging
 logging.basicConfig(level=logging.INFO)
 )
 # Initialize the diarization system
+logger.info("Initializing diarization system...")
 diart = RealtimeSpeakerDiarization()
 success = diart.initialize_models()
 logger.info(f"Models initialized: {success}")
+if success:
+    diart.start_recording()
+# Track active WebSocket connections
+active_connections = set()
+# Periodic status update function
+async def send_conversation_updates():
+    """Periodically send conversation updates to all connected clients"""
+    while True:
+        if active_connections:
+            try:
+                # Get current conversation HTML
+                conversation_html = diart.get_formatted_conversation()
+                # Send to all active connections
+                for ws in active_connections.copy():
+                    try:
+                        await ws.send_text(conversation_html)
+                    except Exception as e:
+                        logger.error(f"Error sending to WebSocket: {e}")
+                        active_connections.discard(ws)
+            except Exception as e:
+                logger.error(f"Error in conversation update: {e}")
+        # Wait before sending next update
+        await asyncio.sleep(0.5)  # 500ms update interval
+@app.on_event("startup")
+async def startup_event():
+    """Start background tasks when the app starts"""
+    asyncio.create_task(send_conversation_updates())
 @app.get("/health")
 async def health_check():
+    """Health check endpoint"""
+    return {
+        "status": "healthy",
+        "system_running": diart.is_running,
+        "active_connections": len(active_connections)
+    }
 @app.websocket("/ws_inference")
 async def ws_inference(ws: WebSocket):
     """WebSocket endpoint for real-time audio processing"""
     await ws.accept()
+    active_connections.add(ws)
+    logger.info(f"WebSocket connection established. Total connections: {len(active_connections)}")
     try:
+        # Send initial conversation state
+        conversation_html = diart.get_formatted_conversation()
+        await ws.send_text(conversation_html)
+        # Process incoming audio chunks
         async for chunk in ws.iter_bytes():
+            try:
+                # Process raw audio bytes
+                if chunk:
+                    # Process audio data - this updates the internal conversation state
+                    diart.process_audio_chunk(chunk)
+            except Exception as e:
+                logger.error(f"Error processing audio chunk: {e}")
+    except WebSocketDisconnect:
+        logger.info("WebSocket disconnected")
     except Exception as e:
         logger.error(f"WebSocket error: {e}")
     finally:
+        active_connections.discard(ws)
+        logger.info(f"WebSocket connection closed. Remaining connections: {len(active_connections)}")
 @app.get("/conversation")
 async def get_conversation():

ui.py CHANGED Viewed

@@ -2,58 +2,251 @@ import gradio as gr
 from fastapi import FastAPI
 from shared import DEFAULT_CHANGE_THRESHOLD, DEFAULT_MAX_SPEAKERS, ABSOLUTE_MAX_SPEAKERS
-# Replace with your actual space URL when deployed
-API_WS = "wss://androidguy-speaker-diarization.hf.space/ws_inference"
 def build_ui():
     """Build Gradio UI for speaker diarization"""
     with gr.Blocks(title="Real-time Speaker Diarization", theme=gr.themes.Soft()) as demo:
         gr.Markdown("# 🎤 Live Speaker Diarization")
         gr.Markdown("Real-time speech recognition with automatic speaker identification")
         with gr.Row():
             with gr.Column(scale=2):
-                # Conversation display with embedded JavaScript
-                output = gr.HTML(
                     """
-                    <div class='output' style='padding:20px; background:#000000; border-radius:10px; min-height:300px;'>
-                      <i>Click 'Start Listening' to begin…</i>
                     </div>
                     <script>
-                    const API_WS = 'wss://androidguy-speaker-diarization.hf.space/ws_inference';
-                    let ws, recorder, mediaStream;
-                    async function startStream() {
-                      try {
-                        mediaStream = await navigator.mediaDevices.getUserMedia({audio:true});
-                        ws = new WebSocket(API_WS);
-                        ws.onopen = () => {
-                          recorder = new MediaRecorder(mediaStream, {mimeType:'audio/webm'});
-                          recorder.ondataavailable = e => {
-                            if (ws.readyState===1 && e.data.size>0) ws.send(e.data);
-                          };
-                          recorder.start(200);
                         };
-                        ws.onmessage = evt => {
-                          document.querySelector('.output').innerHTML = evt.data;
                         };
-                        ws.onerror = err => console.error('WebSocket error:', err);
-                        ws.onclose = () => stopStream();
-                      } catch (err) {
-                        console.error('Error starting stream:', err);
-                        alert(`Error: ${err.message}`);
-                      }
                     }
-                    function stopStream() {
-                      recorder?.state!=='inactive' && recorder.stop();
-                      mediaStream?.getTracks().forEach(t=>t.stop());
-                      ws?.close();
                     }
                     document.addEventListener('DOMContentLoaded', () => {
-                      document.querySelector('button[aria-label="Start Listening"]').onclick = startStream;
-                      document.querySelector('button[aria-label="Stop"]').onclick = stopStream;
                     });
                     </script>
                     """,
@@ -67,11 +260,14 @@ def build_ui():
                     clear_btn = gr.Button("🗑️ Clear", variant="secondary", size="lg")
                 # Status display
-                status_output = gr.Textbox(
-                    label="System Status",
-                    value="Click 'Start Listening' to begin...",
-                    lines=8,
-                    interactive=False
                 )
             with gr.Column(scale=1):
@@ -84,7 +280,7 @@ def build_ui():
                     step=0.05,
                     value=DEFAULT_CHANGE_THRESHOLD,
                     label="Speaker Change Sensitivity",
-                    info="Lower = more sensitive"
                 )
                 max_speakers_slider = gr.Slider(
@@ -101,16 +297,88 @@ def build_ui():
                 gr.Markdown("""
                 ## 📋 Instructions
                 1. **Start Listening** - allows browser to access microphone
-                2. **Speak** - system will recognize different speakers
                 3. **Stop** when finished
                 ## 🎨 Speaker Colors
                 - 🔴 Speaker 1 (Red)
                 - 🟢 Speaker 2 (Teal)
                 - 🔵 Speaker 3 (Blue)
                 - 🟡 Speaker 4 (Green)
                 """)
     return demo
 # Create Gradio interface

 from fastapi import FastAPI
 from shared import DEFAULT_CHANGE_THRESHOLD, DEFAULT_MAX_SPEAKERS, ABSOLUTE_MAX_SPEAKERS
+# Connection configuration (separate signaling server from model server)
+# These will be replaced at deployment time with the correct URLs
+RENDER_SIGNALING_URL = "wss://your-render-app.onrender.com/stream"
+HF_SPACE_URL = "https://androidguy-speaker-diarization.hf.space"
 def build_ui():
     """Build Gradio UI for speaker diarization"""
     with gr.Blocks(title="Real-time Speaker Diarization", theme=gr.themes.Soft()) as demo:
+        # Add configuration variables to page using custom component
+        gr.HTML(
+            f"""
+            <!-- Configuration parameters -->
+            <script>
+                window.RENDER_SIGNALING_URL = "{RENDER_SIGNALING_URL}";
+                window.HF_SPACE_URL = "{HF_SPACE_URL}";
+            </script>
+            """
+        )
+        # Header and description
         gr.Markdown("# 🎤 Live Speaker Diarization")
         gr.Markdown("Real-time speech recognition with automatic speaker identification")
+        # Status indicator
+        connection_status = gr.HTML(
+            """<div class="status-indicator">
+                <span id="status-text" style="color:#888;">Waiting to connect...</span>
+                <span id="status-icon" style="width:10px; height:10px; display:inline-block;
+                    background-color:#888; border-radius:50%; margin-left:5px;"></span>
+            </div>"""
+        )
         with gr.Row():
             with gr.Column(scale=2):
+                # Conversation display with embedded JavaScript for WebRTC and audio handling
+                conversation_display = gr.HTML(
                     """
+                    <div class='output' id="conversation" style='padding:20px; background:#111; border-radius:10px;
+                      min-height:400px; font-family:Arial; font-size:16px; line-height:1.5; overflow-y:auto;'>
+                      <i>Click 'Start Listening' to begin...</i>
                     </div>
                     <script>
+                    // Global variables
+                    let rtcConnection;
+                    let mediaStream;
+                    let wsConnection;
+                    let statusUpdateInterval;
+                    // Check connection to HF space
+                    async function checkHfConnection() {
+                        try {
+                            let response = await fetch(`${window.HF_SPACE_URL}/health`);
+                            return response.ok;
+                        } catch (err) {
+                            return false;
+                        }
+                    }
+                    // Start the connection and audio streaming
+                    async function startStreaming() {
+                        try {
+                            // Update status
+                            updateStatus('connecting');
+                            // Request microphone access
+                            mediaStream = await navigator.mediaDevices.getUserMedia({audio: {
+                                echoCancellation: true,
+                                noiseSuppression: true,
+                                autoGainControl: true
+                            }});
+                            // Set up WebRTC connection to Render signaling server
+                            await setupWebRTC();
+                            // Also connect WebSocket directly to HF Space for conversation updates
+                            setupWebSocket();
+                            // Start status update interval
+                            statusUpdateInterval = setInterval(updateConnectionInfo, 5000);
+                            // Update status
+                            updateStatus('connected');
+                            document.getElementById("conversation").innerHTML = "<i>Connected! Start speaking...</i>";
+                        } catch (err) {
+                            console.error('Error starting stream:', err);
+                            updateStatus('error', err.message);
+                        }
+                    }
+                    // Set up WebRTC connection to Render signaling server
+                    async function setupWebRTC() {
+                        if (rtcConnection) {
+                            rtcConnection.close();
+                        }
+                        // Create new RTCPeerConnection
+                        rtcConnection = new RTCPeerConnection();
+                        // Add audio track to connection
+                        mediaStream.getAudioTracks().forEach(track => {
+                            rtcConnection.addTrack(track, mediaStream);
+                        });
+                        // Create data channel for signaling
+                        const dataChannel = rtcConnection.createDataChannel('audio');
+                        // Create and set local description
+                        const offer = await rtcConnection.createOffer();
+                        await rtcConnection.setLocalDescription(offer);
+                        // Connect to signaling server and exchange SDP
+                        const signalingUrl = window.RENDER_SIGNALING_URL;
+                        const response = await fetch(signalingUrl, {
+                            method: 'POST',
+                            headers: { 'Content-Type': 'application/json' },
+                            body: JSON.stringify({ sdp: rtcConnection.localDescription })
+                        });
+                        const data = await response.json();
+                        await rtcConnection.setRemoteDescription(new RTCSessionDescription(data.sdp));
+                        // Handle ICE candidates
+                        rtcConnection.onicecandidate = event => {
+                            if (event.candidate) {
+                                fetch(signalingUrl, {
+                                    method: 'POST',
+                                    headers: { 'Content-Type': 'application/json' },
+                                    body: JSON.stringify({ candidate: event.candidate })
+                                });
+                            }
+                        };
+                    }
+                    // Set up WebSocket connection to HF Space for conversation updates
+                    function setupWebSocket() {
+                        const wsUrl = `${window.HF_SPACE_URL.replace('http', 'ws')}/ws_inference`;
+                        wsConnection = new WebSocket(wsUrl);
+                        wsConnection.onopen = () => {
+                            console.log('WebSocket connection established');
+                        };
+                        wsConnection.onmessage = (event) => {
+                            document.getElementById("conversation").innerHTML = event.data;
+                            // Auto-scroll to bottom
+                            const container = document.getElementById("conversation");
+                            container.scrollTop = container.scrollHeight;
+                        };
+                        wsConnection.onerror = (error) => {
+                            console.error('WebSocket error:', error);
+                            updateStatus('warning', 'WebSocket error');
                         };
+                        wsConnection.onclose = () => {
+                            console.log('WebSocket connection closed');
+                            // Try to reconnect after a delay
+                            setTimeout(setupWebSocket, 3000);
                         };
+                    }
+                    // Update connection info in the UI
+                    async function updateConnectionInfo() {
+                        try {
+                            const hfConnected = await checkHfConnection();
+                            if (!hfConnected) {
+                                updateStatus('warning', 'HF Space connection issue');
+                            } else if (rtcConnection?.connectionState === 'connected' ||
+                                      rtcConnection?.iceConnectionState === 'connected') {
+                                updateStatus('connected');
+                            } else {
+                                updateStatus('warning', 'Connection unstable');
+                            }
+                        } catch (err) {
+                            console.error('Error updating connection info:', err);
+                        }
+                    }
+                    // Update status indicator
+                    function updateStatus(status, message = '') {
+                        const statusText = document.getElementById('status-text');
+                        const statusIcon = document.getElementById('status-icon');
+                        switch(status) {
+                            case 'connected':
+                                statusText.textContent = 'Connected';
+                                statusIcon.style.backgroundColor = '#4CAF50';
+                                break;
+                            case 'connecting':
+                                statusText.textContent = 'Connecting...';
+                                statusIcon.style.backgroundColor = '#FFC107';
+                                break;
+                            case 'disconnected':
+                                statusText.textContent = 'Disconnected';
+                                statusIcon.style.backgroundColor = '#9E9E9E';
+                                break;
+                            case 'error':
+                                statusText.textContent = 'Error: ' + message;
+                                statusIcon.style.backgroundColor = '#F44336';
+                                break;
+                            case 'warning':
+                                statusText.textContent = 'Warning: ' + message;
+                                statusIcon.style.backgroundColor = '#FF9800';
+                                break;
+                            default:
+                                statusText.textContent = 'Unknown';
+                                statusIcon.style.backgroundColor = '#9E9E9E';
+                        }
                     }
+                    // Stop streaming and clean up
+                    function stopStreaming() {
+                        // Close WebRTC connection
+                        if (rtcConnection) {
+                            rtcConnection.close();
+                            rtcConnection = null;
+                        }
+                        // Close WebSocket
+                        if (wsConnection) {
+                            wsConnection.close();
+                            wsConnection = null;
+                        }
+                        // Stop all tracks in media stream
+                        if (mediaStream) {
+                            mediaStream.getTracks().forEach(track => track.stop());
+                            mediaStream = null;
+                        }
+                        // Clear interval
+                        if (statusUpdateInterval) {
+                            clearInterval(statusUpdateInterval);
+                            statusUpdateInterval = null;
+                        }
+                        // Update status
+                        updateStatus('disconnected');
                     }
+                    // Set up event listeners when the DOM is loaded
                     document.addEventListener('DOMContentLoaded', () => {
+                        updateStatus('disconnected');
                     });
                     </script>
                     """,
                     clear_btn = gr.Button("🗑️ Clear", variant="secondary", size="lg")
                 # Status display
+                status_output = gr.Markdown(
+                    """
+                    ## System Status
+                    Waiting to connect...
+                    *Click Start Listening to begin*
+                    """,
+                    label="Status Information"
                 )
             with gr.Column(scale=1):
                     step=0.05,
                     value=DEFAULT_CHANGE_THRESHOLD,
                     label="Speaker Change Sensitivity",
+                    info="Lower = more sensitive (more speaker changes)"
                 )
                 max_speakers_slider = gr.Slider(
                 gr.Markdown("""
                 ## 📋 Instructions
                 1. **Start Listening** - allows browser to access microphone
+                2. **Speak** - system will transcribe and identify speakers
                 3. **Stop** when finished
+                4. **Clear** to reset conversation
                 ## 🎨 Speaker Colors
                 - 🔴 Speaker 1 (Red)
                 - 🟢 Speaker 2 (Teal)
                 - 🔵 Speaker 3 (Blue)
                 - 🟡 Speaker 4 (Green)
+                - ⭐ Speaker 5 (Yellow)
+                - 🟣 Speaker 6 (Plum)
+                - 🟤 Speaker 7 (Mint)
+                - 🟠 Speaker 8 (Gold)
                 """)
+        # JavaScript to connect buttons to the script functions
+        gr.HTML("""
+        <script>
+            // Wait for Gradio to fully load
+            document.addEventListener('DOMContentLoaded', () => {
+                // Wait a bit for Gradio buttons to be created
+                setTimeout(() => {
+                    // Get the buttons
+                    const startBtn = document.querySelector('button[aria-label="Start Listening"]');
+                    const stopBtn = document.querySelector('button[aria-label="Stop"]');
+                    const clearBtn = document.querySelector('button[aria-label="Clear"]');
+                    if (startBtn) startBtn.onclick = () => startStreaming();
+                    if (stopBtn) stopBtn.onclick = () => stopStreaming();
+                    if (clearBtn) clearBtn.onclick = () => {
+                        // Make API call to clear conversation
+                        fetch(`${window.HF_SPACE_URL}/clear`, {
+                            method: 'POST'
+                        }).then(resp => resp.json())
+                        .then(data => {
+                            document.getElementById("conversation").innerHTML =
+                                "<i>Conversation cleared. Start speaking again...</i>";
+                        });
+                    }
+                    // Set up settings update
+                    const updateBtn = document.querySelector('button[aria-label="Update Settings"]');
+                    if (updateBtn) updateBtn.onclick = () => {
+                        const threshold = document.querySelector('input[aria-label="Speaker Change Sensitivity"]').value;
+                        const maxSpeakers = document.querySelector('input[aria-label="Maximum Speakers"]').value;
+                        fetch(`${window.HF_SPACE_URL}/settings?threshold=${threshold}&max_speakers=${maxSpeakers}`, {
+                            method: 'POST'
+                        }).then(resp => resp.json())
+                        .then(data => {
+                            const statusOutput = document.querySelector('.prose');
+                            if (statusOutput) {
+                                statusOutput.innerHTML = `
+                                    <h2>System Status</h2>
+                                    <p>Settings updated:</p>
+                                    <ul>
+                                        <li>Threshold: ${threshold}</li>
+                                        <li>Max Speakers: ${maxSpeakers}</li>
+                                    </ul>
+                                `;
+                            }
+                        });
+                    }
+                }, 1000);
+            });
+        </script>
+        """)
+        # Set up periodic status updates
+        def get_status():
+            """API call to get system status - called periodically"""
+            import requests
+            try:
+                resp = requests.get(f"{HF_SPACE_URL}/status")
+                if resp.status_code == 200:
+                    return resp.json().get('status', 'No status information')
+                return "Error getting status"
+            except Exception as e:
+                return f"Connection error: {str(e)}"
+        status_timer = gr.Timer(interval=5, function=get_status, outputs=status_output)
     return demo
 # Create Gradio interface