@openfree on Hugging Face: "🌏 Whisper-OCR Multilingual Translation Space 🚀 Welcome! This Space takes…"

Post

507

🌏 Whisper-OCR Multilingual Translation Space 🚀

Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)—no other source language required.

VIDraft/voice-trans

✨ Key Features
🎤 Microphone – Record English speech → transcript + 3-language translation

🔊 Audio File – Upload English audio → transcript + translation

🎬 Video File – Auto-extract audio with FFmpeg → transcript + translation

🖼️ Image – Nanonets-OCR pulls text → translation

📄 PDF – Up to 50 pages of text & tables → translation

🔄 Realtime Mode – Flush every 10-15 s; newest lines appear at the top

🛠️ Quick Start
Click “Duplicate” to fork, or launch directly.

Pick a tab (🎤/🔊/🎬/🖼️/📄/🔄) and feed it English input.

After a few seconds, see the 📜 original and 🌐 3-language translation side by side.

⚡ Tech Stack
openai/whisper-large-v3-turbo — fast, high-accuracy ASR

Nanonets-OCR-s (+ Flash Attention 2) — document/image OCR

Gradio Blocks — clean tabbed UI

PyTorch + CUDA — auto GPU allocation & ThreadPool load balancing

📌 Notes
Translation quality depends on audio quality, lighting, and resolution.

Huge videos hit the HF Space upload cap (~2 GB).

Realtime tab requires browser microphone permission.

Join the conversation