YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

⚑️ YouTube Video Transcriber with Subtitles

Python MLX Parakeet License

High-performance YouTube video transcription with perfectly timed subtitles using Apple MLX and Parakeet v2

πŸš€ Try it Now β€’ ✨ Features β€’ πŸ“– Usage β€’ πŸ› οΈ Installation

🎯 What This Does

Transform any YouTube video segment into a transcribed video with perfectly synchronized subtitles in seconds! Built for Apple Silicon with cutting-edge speech recognition.

⚑️ Lightning Fast

  • ~0.3 seconds to transcribe 1-minute videos
  • Apple MLX optimized for M1/M2/M3 chips
  • Real-time processing with chunked inference

🎯 Pixel-Perfect Timing

  • Sentence-level timing from Parakeet v2
  • No more early/late subtitles - perfect sync
  • Natural speech patterns preserved

✨ Features

🎬 Smart Video Processing

  • YouTube URL input - paste any video link
  • Precise time trimming - specify start/end times (MM:SS or HH:MM:SS)
  • Auto quality selection - best available video/audio

🎀 Advanced Speech Recognition

  • Parakeet TDT v2 model - NVIDIA's latest ASR
  • Conformer + RNNT architecture - not slow transformers
  • Chunked processing - handles long videos efficiently

πŸ“ Subtitle Magic

  • Toggle ON/OFF - choose subtitled or clean video
  • Accurate timing - uses real speech timestamps
  • SRT format - standard subtitle file creation
  • Burned-in subtitles - embedded directly in video

🎨 Beautiful Interface

  • Gradio web UI - clean, modern design
  • Real-time progress - see processing status
  • Dual output - video player + text transcript

πŸš€ Quick Start

1. Clone & Setup

git clone https://github.com/yourusername/youtube-transcriber-subtitles
cd youtube-transcriber-subtitles
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Launch App

python app.py

3. Open Browser

Navigate to http://127.0.0.1:7860

4. Process Video

  1. Paste YouTube URL
  2. Set start/end times (e.g., "1:23" to "2:45")
  3. Toggle subtitles ON/OFF
  4. Click "Process Video"
  5. Download your result!

πŸ“– Usage Examples

πŸŽ“ Educational Content

URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Start: 01:30
End: 03:45
Subtitles: βœ… ON
β†’ Perfect for lecture clips with readable subtitles

πŸŽ™οΈ Podcast Highlights

URL: https://www.youtube.com/watch?v=example123
Start: 15:20
End: 18:50
Subtitles: ❌ OFF
β†’ Clean audio clips without visual distractions

πŸ“Ί Social Media Clips

URL: https://www.youtube.com/watch?v=viral456
Start: 00:10
End: 01:00
Subtitles: βœ… ON
β†’ Engaging clips with perfectly timed captions

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • Apple Silicon Mac (M1/M2/M3) - for MLX acceleration
  • ffmpeg - for video processing
  • yt-dlp - for YouTube downloads

Install ffmpeg (macOS)

brew install ffmpeg

Install Dependencies

pip install -r requirements.txt

Key Dependencies

  • parakeet-mlx - Apple MLX speech recognition
  • gradio - Web interface
  • yt-dlp - YouTube downloader
  • mlx - Apple's ML framework

πŸ”§ Technical Details

🧠 Model Architecture

  • Parakeet TDT 0.6B v2 - 600M parameter model
  • Conformer encoder - superior to transformers on Mac
  • RNNT decoder - streaming-friendly architecture
  • MLX optimized - native Apple Silicon acceleration

βš™οΈ Processing Pipeline

  1. Download video using yt-dlp
  2. Trim to specified time range with ffmpeg
  3. Extract audio at 16kHz mono WAV
  4. Transcribe with chunked inference (120s chunks, 5s overlap)
  5. Generate SRT subtitles with real timing
  6. Embed subtitles using ffmpeg (optional)
  7. Return video + transcript

πŸ“Š Performance

  • Speed: ~5-10x faster than real-time
  • Memory: Efficient chunked processing
  • Quality: State-of-the-art accuracy
  • Compatibility: Apple Silicon optimized

🎨 Interface Preview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ⚑️ YouTube Video Transcriber with Subtitles    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  YouTube URL: [https://youtube.com/watch?v=...] β”‚
β”‚  Start Time:  [01:23]    End Time: [02:45]      β”‚
β”‚  Add Subtitles: β˜‘οΈ ON                           β”‚
β”‚  [πŸš€ Process Video]                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  πŸ“Ή Video Player                                β”‚
β”‚  πŸ“ Full Transcription                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ File Structure

youtube-transcriber-subtitles/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This awesome README
β”œβ”€β”€ temp/                 # Working directory (auto-created)
└── venv/                 # Virtual environment

Ultra-clean codebase - only 3 essential files!

πŸš€ Advanced Usage

Custom Chunking

# Modify in app.py for different chunk sizes
result = MODEL.transcribe(
    audio_file,
    chunk_duration=60,   # Smaller chunks for faster processing
    overlap_duration=3   # Less overlap for speed
)

Subtitle Styling

# Add custom ffmpeg subtitle styling
subtitle_command = [
    "ffmpeg", "-i", video,
    "-vf", f"subtitles={srt}:force_style='FontSize=20,PrimaryColour=&Hffff00'",
    output, "-y"
]

🀝 Contributing

We love contributions! Here's how to help:

  1. 🍴 Fork the repository
  2. 🌟 Create a feature branch
  3. ✨ Make your improvements
  4. πŸ§ͺ Test thoroughly
  5. πŸ“€ Submit a pull request

Ideas for Contributions

  • 🎨 Custom subtitle styling options
  • 🌍 Multi-language support
  • πŸ“± Mobile-friendly interface
  • 🎡 Audio-only processing mode
  • πŸ“Š Batch processing for multiple videos

πŸ“„ License

MIT License - feel free to use in your projects!

πŸ™ Acknowledgments

  • NVIDIA - Parakeet speech recognition models
  • Apple - MLX framework for efficient inference
  • Gradio - Beautiful web interfaces made simple
  • ffmpeg - The Swiss Army knife of multimedia

πŸ“ž Support

Having issues? We're here to help!

  • πŸ› Bug reports: Open an issue
  • πŸ’‘ Feature requests: Start a discussion
  • πŸ“– Documentation: Check this README first
  • πŸ’¬ Community: Join our discussions

⭐ Star this repo if it helped you create amazing transcribed videos! ⭐

Made with ❀️ for the Apple Silicon community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support