Post
39
🌏 Whisper-OCR Multilingual Translation Space 🚀
Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)—no other source language required.
VIDraft/voice-trans
✨ Key Features
🎤 Microphone – Record English speech → transcript + 3-language translation
🔊 Audio File – Upload English audio → transcript + translation
🎬 Video File – Auto-extract audio with FFmpeg → transcript + translation
🖼️ Image – Nanonets-OCR pulls text → translation
📄 PDF – Up to 50 pages of text & tables → translation
🔄 Realtime Mode – Flush every 10-15 s; newest lines appear at the top
🛠️ Quick Start
Click “Duplicate” to fork, or launch directly.
Pick a tab (🎤/🔊/🎬/🖼️/📄/🔄) and feed it English input.
After a few seconds, see the 📜 original and 🌐 3-language translation side by side.
⚡ Tech Stack
openai/whisper-large-v3-turbo — fast, high-accuracy ASR
Nanonets-OCR-s (+ Flash Attention 2) — document/image OCR
Gradio Blocks — clean tabbed UI
PyTorch + CUDA — auto GPU allocation & ThreadPool load balancing
📌 Notes
Translation quality depends on audio quality, lighting, and resolution.
Huge videos hit the HF Space upload cap (~2 GB).
Realtime tab requires browser microphone permission.
Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)—no other source language required.
VIDraft/voice-trans
✨ Key Features
🎤 Microphone – Record English speech → transcript + 3-language translation
🔊 Audio File – Upload English audio → transcript + translation
🎬 Video File – Auto-extract audio with FFmpeg → transcript + translation
🖼️ Image – Nanonets-OCR pulls text → translation
📄 PDF – Up to 50 pages of text & tables → translation
🔄 Realtime Mode – Flush every 10-15 s; newest lines appear at the top
🛠️ Quick Start
Click “Duplicate” to fork, or launch directly.
Pick a tab (🎤/🔊/🎬/🖼️/📄/🔄) and feed it English input.
After a few seconds, see the 📜 original and 🌐 3-language translation side by side.
⚡ Tech Stack
openai/whisper-large-v3-turbo — fast, high-accuracy ASR
Nanonets-OCR-s (+ Flash Attention 2) — document/image OCR
Gradio Blocks — clean tabbed UI
PyTorch + CUDA — auto GPU allocation & ThreadPool load balancing
📌 Notes
Translation quality depends on audio quality, lighting, and resolution.
Huge videos hit the HF Space upload cap (~2 GB).
Realtime tab requires browser microphone permission.