Spaces:
Sleeping
Sleeping
metadata
title: Speaker Diarization
emoji: 🔥
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Real-Time Speaker Diarization
This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time.
Architecture
The system is split into two components:
- Model Server (Hugging Face Space): Runs the speech recognition and speaker diarization models
- Signaling Server (Render): Handles WebRTC signaling for direct audio streaming from browser
Deployment Instructions
Deploy Model Server on Hugging Face Space
- Create a new Space on Hugging Face (Docker SDK)
- Upload all files from the
Speaker-Diarizationdirectory - In Space settings:
- Set Hardware to CPU (or GPU if available)
- Set the public visibility
- Environment: Make sure Docker SDK is selected
Deploy Signaling Server on Render
- Create a new Render Web Service
- Connect to your GitHub repo containing the
render-signaldirectory - Configure Render service:
- Set Build Command:
cd render-signal && pip install -r requirements.txt - Set Start Command:
cd render-signal && python backend.py - Select Environment: Python 3
- Set Environment Variables:
HF_SPACE_URL: Set to your Hugging Face Space URL (e.g.,your-username-speaker-diarization.hf.space)
- Set Build Command:
Update Configuration
After both services are deployed:
Update
ui.pyon your Hugging Face Space:- Change
RENDER_SIGNALING_URLto your Render app URL (wss://your-app.onrender.com/stream) - Make sure
HF_SPACE_URLmatches your actual Hugging Face Space URL
- Change
Update
backend.pyon your Render service:- Set
API_WSto your Hugging Face Space WebSocket URL (wss://your-username-speaker-diarization.hf.space/ws_inference)
- Set
Usage
- Open your Hugging Face Space URL in a web browser
- Click "Start Listening" to begin
- Speak into your microphone
- The system will transcribe your speech and identify different speakers in real-time
Technology Stack
- Frontend: Gradio UI with WebRTC for audio streaming
- Signaling: FastRTC on Render for WebRTC signaling
- Backend: FastAPI + WebSockets
- Models:
- SpeechBrain ECAPA-TDNN for speaker embeddings
- Automatic Speech Recognition for transcription
License
MIT