Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
openfreeΒ 
posted an update about 16 hours ago
Post
507
🌏 Whisper-OCR Multilingual Translation Space πŸš€

Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)β€”no other source language required.

VIDraft/voice-trans

✨ Key Features
🎀 Microphone – Record English speech β†’ transcript + 3-language translation

πŸ”Š Audio File – Upload English audio β†’ transcript + translation

🎬 Video File – Auto-extract audio with FFmpeg β†’ transcript + translation

πŸ–ΌοΈ Image – Nanonets-OCR pulls text β†’ translation

πŸ“„ PDF – Up to 50 pages of text & tables β†’ translation

πŸ”„ Realtime Mode – Flush every 10-15 s; newest lines appear at the top

πŸ› οΈ Quick Start
Click β€œDuplicate” to fork, or launch directly.

Pick a tab (🎀/πŸ”Š/🎬/πŸ–ΌοΈ/πŸ“„/πŸ”„) and feed it English input.

After a few seconds, see the πŸ“œ original and 🌐 3-language translation side by side.

⚑ Tech Stack
openai/whisper-large-v3-turbo β€” fast, high-accuracy ASR

Nanonets-OCR-s (+ Flash Attention 2) β€” document/image OCR

Gradio Blocks β€” clean tabbed UI

PyTorch + CUDA β€” auto GPU allocation & ThreadPool load balancing

πŸ“Œ Notes
Translation quality depends on audio quality, lighting, and resolution.

Huge videos hit the HF Space upload cap (~2 GB).

Realtime tab requires browser microphone permission.
In this post