on1onmangoes commited on
Commit
4e0f4ac
Β·
verified Β·
1 Parent(s): 19ec9d7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +119 -12
README.md CHANGED
@@ -1,12 +1,119 @@
1
- ---
2
- title: Realtime Transcription
3
- emoji: 🐒
4
- colorFrom: indigo
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.23.3
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Real-time Speech Transcription with FastRTC and Whisper
2
+
3
+ This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition.
4
+
5
+ ## Features
6
+ - Real-time audio streaming
7
+ - Voice Activity Detection (VAD)
8
+ - Multi-language support
9
+ - Low latency transcription
10
+
11
+ ## Usage
12
+ 1. Click the microphone button to start recording
13
+ 2. Speak into your microphone
14
+ 3. See your speech transcribed in real-time
15
+ 4. Click the microphone button again to stop recording
16
+
17
+ ## Technical Details
18
+ - Uses FastRTC for WebRTC streaming
19
+ - Powered by Whisper large-v3-turbo model
20
+ - Voice Activity Detection for optimal transcription
21
+ - FastAPI backend with WebSocket support
22
+
23
+ ## Environment Variables
24
+ The following environment variables can be configured:
25
+ - `MODEL_ID`: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")
26
+ - `APP_MODE`: Set to "deployed" for Hugging Face Spaces
27
+ - `UI_MODE`: Set to "fastapi" for the custom UI
28
+
29
+ ## Credits
30
+ - [FastRTC](https://fastrtc.org/) for WebRTC streaming
31
+ - [Whisper](https://github.com/openai/whisper) for speech recognition
32
+ - [Hugging Face](https://huggingface.co/) for model hosting
33
+
34
+ **System Requirements**
35
+ - python >= 3.10
36
+ - ffmpeg
37
+
38
+ ## Installation
39
+
40
+ ### Step 1: Clone the repository
41
+ ```bash
42
+ git clone https://github.com/sofi444/realtime-transcription-fastrtc
43
+ cd realtime-transcription-fastrtc
44
+ ```
45
+
46
+ ### Step 2: Set up environment
47
+ Choose your preferred package manager:
48
+
49
+ <details>
50
+ <summary>πŸ“¦ Using UV (recommended)</summary>
51
+
52
+ [Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)
53
+
54
+
55
+ ```bash
56
+ uv venv --python 3.11 && source .venv/bin/activate
57
+ uv pip install -r requirements.txt
58
+ ```
59
+ </details>
60
+
61
+ <details>
62
+ <summary>🐍 Using pip</summary>
63
+
64
+ ```bash
65
+ python -m venv .venv && source .venv/bin/activate
66
+ pip install --upgrade pip
67
+ pip install -r requirements.txt
68
+ ```
69
+ </details>
70
+
71
+ ### Step 3: Install ffmpeg
72
+ <details>
73
+ <summary>🍎 macOS</summary>
74
+
75
+ ```bash
76
+ brew install ffmpeg
77
+ ```
78
+ </details>
79
+
80
+ <details>
81
+ <summary>🐧 Linux (Ubuntu/Debian)</summary>
82
+
83
+ ```bash
84
+ sudo apt update
85
+ sudo apt install ffmpeg
86
+ ```
87
+ </details>
88
+
89
+ ### Step 4: Configure environment
90
+ Create a `.env` file in the project root:
91
+
92
+ ```env
93
+ UI_MODE=fastapi
94
+ APP_MODE=local
95
+ SERVER_NAME=localhost
96
+ ```
97
+
98
+ - **UI_MODE**: controls the interface to use. If set to `gradio`, you will launch the app via Gradio and use their default UI. If set to anything else (eg. `fastapi`) it will use the `index.html` file in the root directory to create the UI (you can customise it as you want) (default `fastapi`).
99
+ - **APP_MODE**: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to `deployed`, follow the instructions [here](https://fastrtc.org/deployment/) (default `local`).
100
+ - **MODEL_ID**: HF model identifier for the ASR model you want to use (see [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)) (default `openai/whisper-large-v3-turbo`)
101
+ - **SERVER_NAME**: Host to bind to (default `localhost`)
102
+ - **PORT**: Port number (default `7860`)
103
+
104
+ ### Step 5: Launch the application
105
+ ```bash
106
+ python main.py
107
+ ```
108
+ click on the url that pops up (eg. https://localhost:7860) to start using the app!
109
+
110
+
111
+ ### Whisper
112
+
113
+ Choose the Whisper model version you want to use. See all [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending&search=whisper) - you can of course also use a non-Whisper ASR model.
114
+
115
+ On MPS, I can run `whisper-large-v3-turbo` without problems. This is my current favourite as it's lightweight, performant and multi-lingual!
116
+
117
+ Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).
118
+
119
+ If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set `transcribe` as the task).