YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
β‘οΈ YouTube Video Transcriber with Subtitles
High-performance YouTube video transcription with perfectly timed subtitles using Apple MLX and Parakeet v2
π Try it Now β’ β¨ Features β’ π Usage β’ π οΈ Installation
π― What This Does
Transform any YouTube video segment into a transcribed video with perfectly synchronized subtitles in seconds! Built for Apple Silicon with cutting-edge speech recognition.
β‘οΈ Lightning Fast
- ~0.3 seconds to transcribe 1-minute videos
- Apple MLX optimized for M1/M2/M3 chips
- Real-time processing with chunked inference
π― Pixel-Perfect Timing
- Sentence-level timing from Parakeet v2
- No more early/late subtitles - perfect sync
- Natural speech patterns preserved
β¨ Features
π¬ Smart Video Processing
- YouTube URL input - paste any video link
- Precise time trimming - specify start/end times (MM:SS or HH:MM:SS)
- Auto quality selection - best available video/audio
π€ Advanced Speech Recognition
- Parakeet TDT v2 model - NVIDIA's latest ASR
- Conformer + RNNT architecture - not slow transformers
- Chunked processing - handles long videos efficiently
π Subtitle Magic
- Toggle ON/OFF - choose subtitled or clean video
- Accurate timing - uses real speech timestamps
- SRT format - standard subtitle file creation
- Burned-in subtitles - embedded directly in video
π¨ Beautiful Interface
- Gradio web UI - clean, modern design
- Real-time progress - see processing status
- Dual output - video player + text transcript
π Quick Start
1. Clone & Setup
git clone https://github.com/yourusername/youtube-transcriber-subtitles
cd youtube-transcriber-subtitles
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
2. Launch App
python app.py
3. Open Browser
Navigate to http://127.0.0.1:7860
4. Process Video
- Paste YouTube URL
- Set start/end times (e.g., "1:23" to "2:45")
- Toggle subtitles ON/OFF
- Click "Process Video"
- Download your result!
π Usage Examples
π Educational Content
URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Start: 01:30
End: 03:45
Subtitles: β
ON
β Perfect for lecture clips with readable subtitles
ποΈ Podcast Highlights
URL: https://www.youtube.com/watch?v=example123
Start: 15:20
End: 18:50
Subtitles: β OFF
β Clean audio clips without visual distractions
πΊ Social Media Clips
URL: https://www.youtube.com/watch?v=viral456
Start: 00:10
End: 01:00
Subtitles: β
ON
β Engaging clips with perfectly timed captions
π οΈ Installation
Prerequisites
- Python 3.8+
- Apple Silicon Mac (M1/M2/M3) - for MLX acceleration
- ffmpeg - for video processing
- yt-dlp - for YouTube downloads
Install ffmpeg (macOS)
brew install ffmpeg
Install Dependencies
pip install -r requirements.txt
Key Dependencies
parakeet-mlx
- Apple MLX speech recognitiongradio
- Web interfaceyt-dlp
- YouTube downloadermlx
- Apple's ML framework
π§ Technical Details
π§ Model Architecture
- Parakeet TDT 0.6B v2 - 600M parameter model
- Conformer encoder - superior to transformers on Mac
- RNNT decoder - streaming-friendly architecture
- MLX optimized - native Apple Silicon acceleration
βοΈ Processing Pipeline
- Download video using yt-dlp
- Trim to specified time range with ffmpeg
- Extract audio at 16kHz mono WAV
- Transcribe with chunked inference (120s chunks, 5s overlap)
- Generate SRT subtitles with real timing
- Embed subtitles using ffmpeg (optional)
- Return video + transcript
π Performance
- Speed: ~5-10x faster than real-time
- Memory: Efficient chunked processing
- Quality: State-of-the-art accuracy
- Compatibility: Apple Silicon optimized
π¨ Interface Preview
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β β‘οΈ YouTube Video Transcriber with Subtitles β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β YouTube URL: [https://youtube.com/watch?v=...] β
β Start Time: [01:23] End Time: [02:45] β
β Add Subtitles: βοΈ ON β
β [π Process Video] β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β πΉ Video Player β
β π Full Transcription β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
π File Structure
youtube-transcriber-subtitles/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # This awesome README
βββ temp/ # Working directory (auto-created)
βββ venv/ # Virtual environment
Ultra-clean codebase - only 3 essential files!
π Advanced Usage
Custom Chunking
# Modify in app.py for different chunk sizes
result = MODEL.transcribe(
audio_file,
chunk_duration=60, # Smaller chunks for faster processing
overlap_duration=3 # Less overlap for speed
)
Subtitle Styling
# Add custom ffmpeg subtitle styling
subtitle_command = [
"ffmpeg", "-i", video,
"-vf", f"subtitles={srt}:force_style='FontSize=20,PrimaryColour=&Hffff00'",
output, "-y"
]
π€ Contributing
We love contributions! Here's how to help:
- π΄ Fork the repository
- π Create a feature branch
- β¨ Make your improvements
- π§ͺ Test thoroughly
- π€ Submit a pull request
Ideas for Contributions
- π¨ Custom subtitle styling options
- π Multi-language support
- π± Mobile-friendly interface
- π΅ Audio-only processing mode
- π Batch processing for multiple videos
π License
MIT License - feel free to use in your projects!
π Acknowledgments
- NVIDIA - Parakeet speech recognition models
- Apple - MLX framework for efficient inference
- Gradio - Beautiful web interfaces made simple
- ffmpeg - The Swiss Army knife of multimedia
π Support
Having issues? We're here to help!
- π Bug reports: Open an issue
- π‘ Feature requests: Start a discussion
- π Documentation: Check this README first
- π¬ Community: Join our discussions
β Star this repo if it helped you create amazing transcribed videos! β
Made with β€οΈ for the Apple Silicon community
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support