Real-time Speech Transcription with FastRTC and Whisper

This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition.

Features

Real-time audio streaming
Voice Activity Detection (VAD)
Multi-language support
Low latency transcription

Usage

Click the microphone button to start recording
Speak into your microphone
See your speech transcribed in real-time
Click the microphone button again to stop recording

Technical Details

Uses FastRTC for WebRTC streaming
Powered by Whisper large-v3-turbo model
Voice Activity Detection for optimal transcription
FastAPI backend with WebSocket support

Environment Variables

The following environment variables can be configured:

MODEL_ID: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")
APP_MODE: Set to "deployed" for Hugging Face Spaces
UI_MODE: Set to "fastapi" for the custom UI

Credits

FastRTC for WebRTC streaming
Whisper for speech recognition
Hugging Face for model hosting

System Requirements

python >= 3.10
ffmpeg

Installation

Step 1: Clone the repository

git clone https://github.com/sofi444/realtime-transcription-fastrtc
cd realtime-transcription-fastrtc

Step 2: Set up environment

Choose your preferred package manager:

📦 Using UV (recommended)

Install uv

uv venv --python 3.11 && source .venv/bin/activate
uv pip install -r requirements.txt

🐍 Using pip

python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Step 3: Install ffmpeg

🍎 macOS

brew install ffmpeg

🐧 Linux (Ubuntu/Debian)

sudo apt update
sudo apt install ffmpeg

Step 4: Configure environment

Create a .env file in the project root:

UI_MODE=fastapi
APP_MODE=local
SERVER_NAME=localhost

UI_MODE: controls the interface to use. If set to gradio, you will launch the app via Gradio and use their default UI. If set to anything else (eg. fastapi) it will use the index.html file in the root directory to create the UI (you can customise it as you want) (default fastapi).
APP_MODE: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to deployed, follow the instructions here (default local).
MODEL_ID: HF model identifier for the ASR model you want to use (see here) (default openai/whisper-large-v3-turbo)
SERVER_NAME: Host to bind to (default localhost)
PORT: Port number (default 7860)

Step 5: Launch the application

python main.py

click on the url that pops up (eg. https://localhost:7860) to start using the app!

Whisper

Choose the Whisper model version you want to use. See all here - you can of course also use a non-Whisper ASR model.

On MPS, I can run whisper-large-v3-turbo without problems. This is my current favourite as it's lightweight, performant and multi-lingual!

Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).

If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set transcribe as the task).