metadata

title: Talking Head Backend
emoji: 🗣️
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860

Talking Head Backend

This Space hosts the backend for the Talking Head application. The frontend for this application can be found here. (Please update if this link is not for your specific frontend).

Setup for Hugging Face Space

To run this backend successfully on Hugging Face Spaces, you need to configure a few things:

API Keys: This backend requires API keys for OpenAI and ElevenLabs. These must be set as Secrets in your Hugging Face Space settings. Navigate to your Space > Settings > Repository secrets (scroll down) and add the following secrets:
- OPENAI_API_KEY: Your OpenAI API key.
- ELEVENLABS_API_KEY: Your ElevenLabs API key.
The application reads these from environment variables.
Rhubarb Lip Sync: The application uses Rhubarb Lip Sync for generating lip sync data. Ensure the rhubarb executable is present in the bin/ directory of this repository. The Dockerfile copies the contents of the backend/bin/ directory, so if you placed rhubarb in /Users/marcos/Documents/projects/talkinghead/backend/bin/rhubarb before running the copy commands, it should be included in the Docker image at /home/node/app/bin/rhubarb.

If you haven't already, download the Rhubarb-Lip-Sync binary for your OS (likely Linux for the Space environment) from here and place it into mineru_space/backend/bin/. You might need to re-copy or ensure your local git commit includes this binary in the correct location. For a typical Linux x86-64 environment on Spaces, you'd want the corresponding Linux binary.

Local Development (Reminder from original backend/README.md)

For local development, remember to:

Create a .env file in the backend sub-directory with your OPENAI_API_KEY and ELEVENLABS_API_KEY.
Place the Rhubarb binary in backend/bin/.
Run yarn install and yarn dev in the backend sub-directory.

This Space is configured to use the PORT environment variable, defaulting to 7860. Your index.js should respect process.env.PORT.

Endpoints

/chat: Handles text-based chat interactions.
/voice-chat: Handles voice-based chat interactions.
/voices: Lists available voices from ElevenLabs.

(You can add more details about your API, how to use it, etc.)

Features

Web interface for uploading and converting PDF files
API endpoint for programmatic access
High-quality PDF extraction with support for tables, formulas, and complex layouts
Output in both Markdown and structured JSON formats

API Usage

The service exposes a dedicated API endpoint for programmatic access:

PDF Conversion Endpoint

POST /api/convert

Request:

Content-Type: multipart/form-data
Body: form field 'file' containing the PDF file

Response:

{
  "success": true,
  "message": "PDF conversion successful",
  "job_id": "uuid",
  "base_filename": "filename",
  "markdown": "# Converted markdown content...",
  "json": { 
    "title": "Document Title",
    "sections": [...]
  },
  "log": "Processing log..."
}

Client Example

A Python client script (api_client.py) is included in this repository for easy integration:

# Example usage
python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineru.hf.space

You can also use curl:

curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineru.hf.space/api/convert

Web Interface

The Space also provides a web interface where you can:

Upload PDF files for conversion
View the generated Markdown and JSON
Download the converted files
View processing logs

Implementation Details

This service uses:

MinerU for high-quality PDF extraction
Flask web server for the interface and API
Docker container for deployment on Hugging Face Spaces

Learn More

For more information about MinerU, visit the MinerU repository. # Last attempt to refresh build: Wed May 7 00:37:41 CEST 2025