Spaces:
Build error
Build error
# Real-time Speech Transcription with FastRTC and Whisper | |
This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition. | |
## Features | |
- Real-time audio streaming | |
- Voice Activity Detection (VAD) | |
- Multi-language support | |
- Low latency transcription | |
## Usage | |
1. Click the microphone button to start recording | |
2. Speak into your microphone | |
3. See your speech transcribed in real-time | |
4. Click the microphone button again to stop recording | |
## Technical Details | |
- Uses FastRTC for WebRTC streaming | |
- Powered by Whisper large-v3-turbo model | |
- Voice Activity Detection for optimal transcription | |
- FastAPI backend with WebSocket support | |
## Environment Variables | |
The following environment variables can be configured: | |
- `MODEL_ID`: Hugging Face model ID (default: "openai/whisper-large-v3-turbo") | |
- `APP_MODE`: Set to "deployed" for Hugging Face Spaces | |
- `UI_MODE`: Set to "fastapi" for the custom UI | |
## Credits | |
- [FastRTC](https://fastrtc.org/) for WebRTC streaming | |
- [Whisper](https://github.com/openai/whisper) for speech recognition | |
- [Hugging Face](https://huggingface.co/) for model hosting | |
**System Requirements** | |
- python >= 3.10 | |
- ffmpeg | |
## Installation | |
### Step 1: Clone the repository | |
```bash | |
git clone https://github.com/sofi444/realtime-transcription-fastrtc | |
cd realtime-transcription-fastrtc | |
``` | |
### Step 2: Set up environment | |
Choose your preferred package manager: | |
<details> | |
<summary>π¦ Using UV (recommended)</summary> | |
[Install `uv`](https://docs.astral.sh/uv/getting-started/installation/) | |
```bash | |
uv venv --python 3.11 && source .venv/bin/activate | |
uv pip install -r requirements.txt | |
``` | |
</details> | |
<details> | |
<summary>π Using pip</summary> | |
```bash | |
python -m venv .venv && source .venv/bin/activate | |
pip install --upgrade pip | |
pip install -r requirements.txt | |
``` | |
</details> | |
### Step 3: Install ffmpeg | |
<details> | |
<summary>π macOS</summary> | |
```bash | |
brew install ffmpeg | |
``` | |
</details> | |
<details> | |
<summary>π§ Linux (Ubuntu/Debian)</summary> | |
```bash | |
sudo apt update | |
sudo apt install ffmpeg | |
``` | |
</details> | |
### Step 4: Configure environment | |
Create a `.env` file in the project root: | |
```env | |
UI_MODE=fastapi | |
APP_MODE=local | |
SERVER_NAME=localhost | |
``` | |
- **UI_MODE**: controls the interface to use. If set to `gradio`, you will launch the app via Gradio and use their default UI. If set to anything else (eg. `fastapi`) it will use the `index.html` file in the root directory to create the UI (you can customise it as you want) (default `fastapi`). | |
- **APP_MODE**: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to `deployed`, follow the instructions [here](https://fastrtc.org/deployment/) (default `local`). | |
- **MODEL_ID**: HF model identifier for the ASR model you want to use (see [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)) (default `openai/whisper-large-v3-turbo`) | |
- **SERVER_NAME**: Host to bind to (default `localhost`) | |
- **PORT**: Port number (default `7860`) | |
### Step 5: Launch the application | |
```bash | |
python main.py | |
``` | |
click on the url that pops up (eg. https://localhost:7860) to start using the app! | |
### Whisper | |
Choose the Whisper model version you want to use. See all [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending&search=whisper) - you can of course also use a non-Whisper ASR model. | |
On MPS, I can run `whisper-large-v3-turbo` without problems. This is my current favourite as it's lightweight, performant and multi-lingual! | |
Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available). | |
If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set `transcribe` as the task). |