YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

VocRT - Personal Realtime Voice-to-Voice AI Solution

License: MIT Python 3.10

https://vocrt.vercel.app

VocRT is a comprehensive, privacy-first Realtime Voice-to-Voice (V2V) solution that enables natural conversations with AI. Built with cutting-edge TTS models, RAG capabilities, and seamless integration, VocRT processes your voice input and responds with high-quality synthesized speech in real-time.

πŸš€ Key Features

Real-time Voice Processing

  • Ultra-low latency voice-to-voice conversion
  • High-quality speech synthesis using Kokoro-82M model
  • Customizable voice selection with multiple voice options
  • Adjustable threshold and silence duration for optimal user experience

Advanced RAG Capabilities

  • Multi-format document support: PDF, CSV, TXT, PPT, PPTX, DOC, DOCX, XLS, XLSX
  • URL content extraction: Process web pages, Medium blogs, and online PDFs
  • Unlimited document uploads without usage limits or billing concerns
  • 100% privacy-first approach with local processing

Privacy & Cost Benefits

  • No API usage limits or recurring charges
  • Complete data privacy - all processing happens locally
  • Offline capability use local LLM model if resources allow
  • No data sharing with external AI services

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   React Client  │◄──►│  Express Server │◄──►│  VocRT Engine   β”‚
β”‚   (Frontend)    β”‚    β”‚   (Backend)     β”‚    β”‚   (Python)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        |
                                                        |
                                 _______________________|
                                β”‚                       β”‚
                                β–Ό                       β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚    Embeddings    β”‚    β”‚   Whisper STT   β”‚
                       β”‚   (e5-base-v2)   β”‚    β”‚   Kokoro TTS    β”‚
                       β”‚    Qdrant DB     β”‚    β”‚                 β”‚
                       β”‚   (Vector Store) β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Repository Structure

VocRT/
β”œβ”€β”€ backend/         # Express.js server
β”œβ”€β”€ frontend/        # React client application
β”œβ”€β”€ models/          # AI models directory
β”œβ”€β”€ voices/          # Available voice profiles
β”œβ”€β”€ demo/            # Sample audio and demo files
β”œβ”€β”€ .env             # Environment configuration
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md        # Project documentation

πŸ› οΈ Manual Installation

Prerequisites

  • Python 3.10 (required)
  • Node.js 16+ and npm
  • Docker (for Qdrant vector database)
  • Git for cloning repositories

Step 1: Clone Repository

git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT

Step 2: Python Environment Setup

macOS/Linux:

python3.10 -m venv venv
source venv/bin/activate

Windows:

python3.10 -m venv venv
venv\Scripts\activate

Step 3: Install Python Dependencies

pip install -r requirements.txt

If the installation fails (e.g. due to dependency or PyTorch issues), try the following recovery steps:

pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install -r requirements.txt

Step 4: Install eSpeak

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install espeak

macOS:

# Install Homebrew if not present
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install eSpeak
brew install espeak

Windows:

  1. Download from eSpeak official website
  2. Run installer and follow instructions
  3. Add installation path to system PATH environment variable
  4. Verify installation: espeak --version

Verification:

espeak "VocRT installation successful!"

Step 5: Backend Setup (Express.js)

cd backend
npm install
npm run dev

Step 6: Frontend Setup (Vite)

cd frontend
npm install
npm run dev

Step 7: Qdrant Vector Database Setup

Documentation: Qdrant Quickstart Guide

# Pull Qdrant image
docker pull qdrant/qdrant

# Start Qdrant container
docker run -p 6333:6333 -p 6334:6334 \
  -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
  qdrant/qdrant

Access Points:

Step 8: Download Required Models

Embedding Model:

Clone e5-base-v2 to models/e5-base-v2

alt text

Whisper STT Model:

Choose your preferred Whisper model size:

βœ… Just specify the model name(app.py) β€” it will be automatically downloaded and loaded.

  • tiny: Fastest, lower accuracy
  • base: Balanced performance
  • small: Better accuracy
  • medium/large: Highest accuracy, slower processing

alt text

Step 9: Environment Configuration

Edit .env file with your API credentials:

# LLM Configuration
OPENAI_API_KEY=your_openai_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
LLM_PROVIDER=google  # or 'google' for Gemini
LLM_MODEL=gemini-2.0-flash  # or your preferred model

Step 10: Launch VocRT Server

python3 app.py

🎯 Usage Guide

  1. Access the application: Navigate to http://localhost:3000
  2. Select voice profile: Choose from available voice options
  3. Configure settings: Adjust silence duration for optimal performance
  4. Add context: Upload documents, provide URLs, or enter text for AI context
  5. Start conversation: Begin speaking and enjoy real-time voice responses

πŸ“Š Supported Document Formats

Format Extension Description
PDF .pdf Portable Document Format
Text .txt Plain text files
Word .doc, .docx Microsoft Word documents
Excel .xls, .xlsx Microsoft Excel spreadsheets
PowerPoint .ppt, .pptx Microsoft PowerPoint presentations
CSV .csv Comma-separated values
URLs Web links Online content, blogs, PDFs

πŸ€– AI Models & Technology Stack

Core Models

  • TTS: Kokoro-82M - High-quality text-to-speech
  • STT: OpenAI Whisper - Accurate speech recognition
  • Embeddings: e5-base-v2 - Semantic text understanding
  • LLM: OpenAI GPT / Google Gemini - Natural language processing

Technology Stack

  • Backend: Python, Express.js, gRPC
  • Frontend: React, Vite
  • Database: Qdrant (Vector Database)
  • Audio Processing: Whisper, eSpeak, phonemizer

πŸ”§ Performance Optimization

Hardware Recommendations

  • CPU: Multi-core processor (4+ cores recommended)
  • RAM: 4GB+ for optimal performance
  • Storage: SSD for faster model loading
  • GPU: Optional, for accelerated inference can reduce latency upto 60%

Configuration Tips

  • Modify silence duration for natural conversation flow
  • Use smaller Whisper models for faster STT processing
  • Enable GPU acceleration if available

🀝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

  • πŸ› Bug Reports: Submit issues with detailed reproduction steps
  • πŸ’‘ Feature Requests: Suggest new capabilities and improvements
  • πŸ“ Documentation: Improve guides, tutorials, and API docs
  • πŸ”§ Code Contributions: Submit pull requests with enhancements

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License

πŸ™ Acknowledgments

Special thanks to the amazing open-source communities:

πŸ“ž Support & Contact


πŸ“ž Website


⭐ If VocRT helps your projects, please consider giving it a star!


Built with ❀️ for the open-source community

Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support