Post
507
π Whisper-OCR Multilingual Translation Space π
Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)βno other source language required.
VIDraft/voice-trans
β¨ Key Features
π€ Microphoneββ Record English speech β transcript + 3-language translation
π Audio Fileββ Upload English audio β transcript + translation
π¬ Video Fileββ Auto-extract audio with FFmpeg β transcript + translation
πΌοΈ Imageββ Nanonets-OCR pulls text β translation
π PDFββ Up to 50 pages of text & tables β translation
π Realtime Modeββ Flush every 10-15 s; newest lines appear at the top
π οΈ Quick Start
Click βDuplicateβ to fork, or launch directly.
Pick a tab (π€/π/π¬/πΌοΈ/π/π) and feed it English input.
After a few seconds, see the π original and π 3-language translation side by side.
β‘ Tech Stack
openai/whisper-large-v3-turbo β fast, high-accuracy ASR
Nanonets-OCR-s (+ Flash Attention 2) β document/image OCR
Gradio Blocks β clean tabbed UI
PyTorch + CUDA β auto GPU allocation & ThreadPool load balancing
π Notes
Translation quality depends on audio quality, lighting, and resolution.
Huge videos hit the HF Space upload cap (~2 GB).
Realtime tab requires browser microphone permission.
Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)βno other source language required.
VIDraft/voice-trans
β¨ Key Features
π€ Microphoneββ Record English speech β transcript + 3-language translation
π Audio Fileββ Upload English audio β transcript + translation
π¬ Video Fileββ Auto-extract audio with FFmpeg β transcript + translation
πΌοΈ Imageββ Nanonets-OCR pulls text β translation
π PDFββ Up to 50 pages of text & tables β translation
π Realtime Modeββ Flush every 10-15 s; newest lines appear at the top
π οΈ Quick Start
Click βDuplicateβ to fork, or launch directly.
Pick a tab (π€/π/π¬/πΌοΈ/π/π) and feed it English input.
After a few seconds, see the π original and π 3-language translation side by side.
β‘ Tech Stack
openai/whisper-large-v3-turbo β fast, high-accuracy ASR
Nanonets-OCR-s (+ Flash Attention 2) β document/image OCR
Gradio Blocks β clean tabbed UI
PyTorch + CUDA β auto GPU allocation & ThreadPool load balancing
π Notes
Translation quality depends on audio quality, lighting, and resolution.
Huge videos hit the HF Space upload cap (~2 GB).
Realtime tab requires browser microphone permission.