Spaces:

on1onmangoes
/

realtime-transcription

Build error

App Files Files Community

on1onmangoes commited on Apr 3

Commit

4e0f4ac

verified ·

1 Parent(s): 19ec9d7

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +119 -12

README.md CHANGED Viewed

@@ -1,12 +1,119 @@
----
-title: Realtime Transcription
-emoji: 🐢
-colorFrom: indigo
-colorTo: blue
-sdk: gradio
-sdk_version: 5.23.3
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Real-time Speech Transcription with FastRTC and Whisper
+This application provides real-time speech transcription using FastRTC for audio streaming and Whisper for speech recognition.
+## Features
+- Real-time audio streaming
+- Voice Activity Detection (VAD)
+- Multi-language support
+- Low latency transcription
+## Usage
+1. Click the microphone button to start recording
+2. Speak into your microphone
+3. See your speech transcribed in real-time
+4. Click the microphone button again to stop recording
+## Technical Details
+- Uses FastRTC for WebRTC streaming
+- Powered by Whisper large-v3-turbo model
+- Voice Activity Detection for optimal transcription
+- FastAPI backend with WebSocket support
+## Environment Variables
+The following environment variables can be configured:
+- `MODEL_ID`: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")
+- `APP_MODE`: Set to "deployed" for Hugging Face Spaces
+- `UI_MODE`: Set to "fastapi" for the custom UI
+## Credits
+- [FastRTC](https://fastrtc.org/) for WebRTC streaming
+- [Whisper](https://github.com/openai/whisper) for speech recognition
+- [Hugging Face](https://huggingface.co/) for model hosting
+**System Requirements**
+- python >= 3.10
+- ffmpeg
+## Installation
+### Step 1: Clone the repository
+```bash
+git clone https://github.com/sofi444/realtime-transcription-fastrtc
+cd realtime-transcription-fastrtc
+```
+### Step 2: Set up environment
+Choose your preferred package manager:
+<details>
+<summary>📦 Using UV (recommended)</summary>
+[Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)
+```bash
+uv venv --python 3.11 && source .venv/bin/activate
+uv pip install -r requirements.txt
+```
+</details>
+<details>
+<summary>🐍 Using pip</summary>
+```bash
+python -m venv .venv && source .venv/bin/activate
+pip install --upgrade pip
+pip install -r requirements.txt
+```
+</details>
+### Step 3: Install ffmpeg
+<details>
+<summary>🍎 macOS</summary>
+```bash
+brew install ffmpeg
+```
+</details>
+<details>
+<summary>🐧 Linux (Ubuntu/Debian)</summary>
+```bash
+sudo apt update
+sudo apt install ffmpeg
+```
+</details>
+### Step 4: Configure environment
+Create a `.env` file in the project root:
+```env
+UI_MODE=fastapi
+APP_MODE=local
+SERVER_NAME=localhost
+```
+- **UI_MODE**: controls the interface to use. If set to `gradio`, you will launch the app via Gradio and use their default UI. If set to anything else (eg. `fastapi`) it will use the `index.html` file in the root directory to create the UI (you can customise it as you want) (default `fastapi`).
+- **APP_MODE**: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to `deployed`, follow the instructions [here](https://fastrtc.org/deployment/) (default `local`).
+- **MODEL_ID**: HF model identifier for the ASR model you want to use (see [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)) (default `openai/whisper-large-v3-turbo`)
+- **SERVER_NAME**: Host to bind to (default `localhost`)
+- **PORT**: Port number (default `7860`)
+### Step 5: Launch the application
+```bash
+python main.py
+```
+click on the url that pops up (eg. https://localhost:7860) to start using the app!
+### Whisper
+Choose the Whisper model version you want to use. See all [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending&search=whisper) - you can of course also use a non-Whisper ASR model.
+On MPS, I can run `whisper-large-v3-turbo` without problems. This is my current favourite as it's lightweight, performant and multi-lingual!
+Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).
+If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set `transcribe` as the task).