Spaces:
Build error
Build error
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,12 +1,119 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Real-time Speech Transcription with FastRTC and Whisper
|
2 |
+
|
3 |
+
This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition.
|
4 |
+
|
5 |
+
## Features
|
6 |
+
- Real-time audio streaming
|
7 |
+
- Voice Activity Detection (VAD)
|
8 |
+
- Multi-language support
|
9 |
+
- Low latency transcription
|
10 |
+
|
11 |
+
## Usage
|
12 |
+
1. Click the microphone button to start recording
|
13 |
+
2. Speak into your microphone
|
14 |
+
3. See your speech transcribed in real-time
|
15 |
+
4. Click the microphone button again to stop recording
|
16 |
+
|
17 |
+
## Technical Details
|
18 |
+
- Uses FastRTC for WebRTC streaming
|
19 |
+
- Powered by Whisper large-v3-turbo model
|
20 |
+
- Voice Activity Detection for optimal transcription
|
21 |
+
- FastAPI backend with WebSocket support
|
22 |
+
|
23 |
+
## Environment Variables
|
24 |
+
The following environment variables can be configured:
|
25 |
+
- `MODEL_ID`: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")
|
26 |
+
- `APP_MODE`: Set to "deployed" for Hugging Face Spaces
|
27 |
+
- `UI_MODE`: Set to "fastapi" for the custom UI
|
28 |
+
|
29 |
+
## Credits
|
30 |
+
- [FastRTC](https://fastrtc.org/) for WebRTC streaming
|
31 |
+
- [Whisper](https://github.com/openai/whisper) for speech recognition
|
32 |
+
- [Hugging Face](https://huggingface.co/) for model hosting
|
33 |
+
|
34 |
+
**System Requirements**
|
35 |
+
- python >= 3.10
|
36 |
+
- ffmpeg
|
37 |
+
|
38 |
+
## Installation
|
39 |
+
|
40 |
+
### Step 1: Clone the repository
|
41 |
+
```bash
|
42 |
+
git clone https://github.com/sofi444/realtime-transcription-fastrtc
|
43 |
+
cd realtime-transcription-fastrtc
|
44 |
+
```
|
45 |
+
|
46 |
+
### Step 2: Set up environment
|
47 |
+
Choose your preferred package manager:
|
48 |
+
|
49 |
+
<details>
|
50 |
+
<summary>π¦ Using UV (recommended)</summary>
|
51 |
+
|
52 |
+
[Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)
|
53 |
+
|
54 |
+
|
55 |
+
```bash
|
56 |
+
uv venv --python 3.11 && source .venv/bin/activate
|
57 |
+
uv pip install -r requirements.txt
|
58 |
+
```
|
59 |
+
</details>
|
60 |
+
|
61 |
+
<details>
|
62 |
+
<summary>π Using pip</summary>
|
63 |
+
|
64 |
+
```bash
|
65 |
+
python -m venv .venv && source .venv/bin/activate
|
66 |
+
pip install --upgrade pip
|
67 |
+
pip install -r requirements.txt
|
68 |
+
```
|
69 |
+
</details>
|
70 |
+
|
71 |
+
### Step 3: Install ffmpeg
|
72 |
+
<details>
|
73 |
+
<summary>π macOS</summary>
|
74 |
+
|
75 |
+
```bash
|
76 |
+
brew install ffmpeg
|
77 |
+
```
|
78 |
+
</details>
|
79 |
+
|
80 |
+
<details>
|
81 |
+
<summary>π§ Linux (Ubuntu/Debian)</summary>
|
82 |
+
|
83 |
+
```bash
|
84 |
+
sudo apt update
|
85 |
+
sudo apt install ffmpeg
|
86 |
+
```
|
87 |
+
</details>
|
88 |
+
|
89 |
+
### Step 4: Configure environment
|
90 |
+
Create a `.env` file in the project root:
|
91 |
+
|
92 |
+
```env
|
93 |
+
UI_MODE=fastapi
|
94 |
+
APP_MODE=local
|
95 |
+
SERVER_NAME=localhost
|
96 |
+
```
|
97 |
+
|
98 |
+
- **UI_MODE**: controls the interface to use. If set to `gradio`, you will launch the app via Gradio and use their default UI. If set to anything else (eg. `fastapi`) it will use the `index.html` file in the root directory to create the UI (you can customise it as you want) (default `fastapi`).
|
99 |
+
- **APP_MODE**: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to `deployed`, follow the instructions [here](https://fastrtc.org/deployment/) (default `local`).
|
100 |
+
- **MODEL_ID**: HF model identifier for the ASR model you want to use (see [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)) (default `openai/whisper-large-v3-turbo`)
|
101 |
+
- **SERVER_NAME**: Host to bind to (default `localhost`)
|
102 |
+
- **PORT**: Port number (default `7860`)
|
103 |
+
|
104 |
+
### Step 5: Launch the application
|
105 |
+
```bash
|
106 |
+
python main.py
|
107 |
+
```
|
108 |
+
click on the url that pops up (eg. https://localhost:7860) to start using the app!
|
109 |
+
|
110 |
+
|
111 |
+
### Whisper
|
112 |
+
|
113 |
+
Choose the Whisper model version you want to use. See all [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending&search=whisper) - you can of course also use a non-Whisper ASR model.
|
114 |
+
|
115 |
+
On MPS, I can run `whisper-large-v3-turbo` without problems. This is my current favourite as it's lightweight, performant and multi-lingual!
|
116 |
+
|
117 |
+
Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).
|
118 |
+
|
119 |
+
If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set `transcribe` as the task).
|