Spaces:
Build error
Real-time Speech Transcription with FastRTC and Whisper
This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition.
Features
- Real-time audio streaming
- Voice Activity Detection (VAD)
- Multi-language support
- Low latency transcription
Usage
- Click the microphone button to start recording
- Speak into your microphone
- See your speech transcribed in real-time
- Click the microphone button again to stop recording
Technical Details
- Uses FastRTC for WebRTC streaming
- Powered by Whisper large-v3-turbo model
- Voice Activity Detection for optimal transcription
- FastAPI backend with WebSocket support
Environment Variables
The following environment variables can be configured:
MODEL_ID
: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")APP_MODE
: Set to "deployed" for Hugging Face SpacesUI_MODE
: Set to "fastapi" for the custom UI
Credits
- FastRTC for WebRTC streaming
- Whisper for speech recognition
- Hugging Face for model hosting
System Requirements
- python >= 3.10
- ffmpeg
Installation
Step 1: Clone the repository
git clone https://github.com/sofi444/realtime-transcription-fastrtc
cd realtime-transcription-fastrtc
Step 2: Set up environment
Choose your preferred package manager:
π¦ Using UV (recommended)
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -r requirements.txt
π Using pip
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Step 3: Install ffmpeg
π macOS
brew install ffmpeg
π§ Linux (Ubuntu/Debian)
sudo apt update
sudo apt install ffmpeg
Step 4: Configure environment
Create a .env
file in the project root:
UI_MODE=fastapi
APP_MODE=local
SERVER_NAME=localhost
- UI_MODE: controls the interface to use. If set to
gradio
, you will launch the app via Gradio and use their default UI. If set to anything else (eg.fastapi
) it will use theindex.html
file in the root directory to create the UI (you can customise it as you want) (defaultfastapi
). - APP_MODE: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to
deployed
, follow the instructions here (defaultlocal
). - MODEL_ID: HF model identifier for the ASR model you want to use (see here) (default
openai/whisper-large-v3-turbo
) - SERVER_NAME: Host to bind to (default
localhost
) - PORT: Port number (default
7860
)
Step 5: Launch the application
python main.py
click on the url that pops up (eg. https://localhost:7860) to start using the app!
Whisper
Choose the Whisper model version you want to use. See all here - you can of course also use a non-Whisper ASR model.
On MPS, I can run whisper-large-v3-turbo
without problems. This is my current favourite as it's lightweight, performant and multi-lingual!
Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).
If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set transcribe
as the task).