Spaces:
Runtime error
Runtime error
Update Readme
Browse files
README.md
CHANGED
|
@@ -10,4 +10,89 @@ pinned: false
|
|
| 10 |
short_description: This tool is intended to help transcribing interviews.
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
short_description: This tool is intended to help transcribing interviews.
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Audio Transcription App
|
| 14 |
+
|
| 15 |
+
A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
- **Multiple Audio File Support**: Process multiple MP3 or M4A files simultaneously
|
| 20 |
+
- **Silence Removal**: Option to remove silence from audio to reduce processing time and improve accuracy
|
| 21 |
+
- **Audio Chunking**: Split long audio files into manageable chunks for better processing
|
| 22 |
+
- **Multiple Language Support**: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
|
| 23 |
+
- **Multiple Whisper Models**: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
|
| 24 |
+
- **Detailed Output**: Get both full transcriptions and segment-wise transcriptions with timestamps
|
| 25 |
+
- **Download Results**: All processed files and transcripts are provided in a convenient ZIP file
|
| 26 |
+
|
| 27 |
+
## Setup
|
| 28 |
+
|
| 29 |
+
1. Clone the repository
|
| 30 |
+
2. Install the required dependencies:
|
| 31 |
+
```bash
|
| 32 |
+
pip install -r requirements.txt
|
| 33 |
+
```
|
| 34 |
+
3. Make sure you have ffmpeg installed on your system
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
1. Run the application:
|
| 39 |
+
```bash
|
| 40 |
+
python app.py
|
| 41 |
+
```
|
| 42 |
+
2. Open the provided local URL in your web browser
|
| 43 |
+
3. Upload your audio file(s)
|
| 44 |
+
4. Configure the settings:
|
| 45 |
+
- Enable/disable silence removal
|
| 46 |
+
- Enable/disable audio chunking
|
| 47 |
+
- Select the Whisper model size
|
| 48 |
+
- Choose the target language
|
| 49 |
+
5. Click "Process" to start transcription
|
| 50 |
+
6. View the results and download the ZIP file containing all processed files
|
| 51 |
+
|
| 52 |
+
## Settings
|
| 53 |
+
|
| 54 |
+
### Silence Removal
|
| 55 |
+
- **Minimum Silence Length**: 100-2000ms (default: 500ms)
|
| 56 |
+
- **Silence Threshold**: -70 to -30dB (default: -50dB)
|
| 57 |
+
|
| 58 |
+
### Chunking
|
| 59 |
+
- **Chunk Duration**: 60-3600 seconds (default: 600 seconds/10 minutes)
|
| 60 |
+
- **FFmpeg Path**: Path to ffmpeg executable (default: "ffmpeg")
|
| 61 |
+
|
| 62 |
+
### Transcription
|
| 63 |
+
- **Model Size**: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
|
| 64 |
+
- **Language**: German (de), English (en), French (fr), Spanish (es), Italian (it)
|
| 65 |
+
|
| 66 |
+
## Output
|
| 67 |
+
|
| 68 |
+
- **Full Transcription**: Complete text of the audio file
|
| 69 |
+
- **Segmented Transcription**: Text segments with timestamps
|
| 70 |
+
- **ZIP File**: Contains:
|
| 71 |
+
- Processed audio files
|
| 72 |
+
- Individual transcript files
|
| 73 |
+
- Combined transcript file
|
| 74 |
+
|
| 75 |
+
## Deployment on Hugging Face Spaces
|
| 76 |
+
|
| 77 |
+
1. Create a new Space on Hugging Face
|
| 78 |
+
2. Choose "Gradio" as the SDK
|
| 79 |
+
3. Upload the following files:
|
| 80 |
+
- app.py
|
| 81 |
+
- requirements.txt
|
| 82 |
+
4. The app will automatically deploy and be available at your Space's URL
|
| 83 |
+
|
| 84 |
+
## Requirements
|
| 85 |
+
|
| 86 |
+
- Python 3.7+
|
| 87 |
+
- ffmpeg
|
| 88 |
+
- See requirements.txt for Python package dependencies
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
This project is open source and available under the MIT License.
|
| 93 |
+
|
| 94 |
+
## Acknowledgments
|
| 95 |
+
|
| 96 |
+
- [OpenAI Whisper](https://github.com/openai/whisper)
|
| 97 |
+
- [Gradio](https://gradio.app/)
|
| 98 |
+
- [FFmpeg](https://ffmpeg.org/)
|