Spaces:

bravedims
/

AI_Avatar_Chat

Running

App Files Files Community

AI_Avatar_Chat / README.md

bravedims

📏 Fix short_description length for HuggingFace Spaces validation

f4f48b3 4 months ago

preview code

raw

history blame

4.9 kB

metadata

title: OmniAvatar-14B Video Generation
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
suggested_hardware: a10g-small
suggested_storage: large
short_description: Avatar video generation with adaptive body animation
models:
  - OmniAvatar/OmniAvatar-14B
  - Wan-AI/Wan2.1-T2V-14B
  - facebook/wav2vec2-base-960h
tags:
  - avatar-generation
  - video-generation
  - text-to-video
  - audio-driven-animation
  - lip-sync
  - body-animation
preload_from_hub:
  - OmniAvatar/OmniAvatar-14B
  - facebook/wav2vec2-base-960h

🎬 OmniAvatar-14B: Avatar Video Generation with Adaptive Body Animation

This is a VIDEO GENERATION application that creates animated avatar videos, not just audio!

🎯 What This Application Does

PRIMARY FUNCTION: Avatar Video Generation

✅ Generates 480p MP4 videos of animated avatars
✅ Audio-driven lip-sync with precise mouth movements
✅ Adaptive body animation that responds to speech content
✅ Reference image support for character consistency
✅ Prompt-controlled behavior for specific actions and expressions

Input → Output:

Text Prompt + Audio/TTS → MP4 Avatar Video (480p, 25fps)

Example:

Input: "A professional teacher explaining mathematics" + "Hello students, today we'll learn calculus"
Output: MP4 video of an avatar teacher with lip-sync and teaching gestures

🚀 Quick Start - Video Generation

1. Generate Avatar Videos

Web Interface: Use the Gradio interface above
API Endpoint: Available at /generate

2. Model Requirements

This application requires large models (~30GB) for video generation:

Wan2.1-T2V-14B: Base text-to-video model (~28GB)
OmniAvatar-14B: Avatar animation weights (~2GB)
wav2vec2-base-960h: Audio encoder (~360MB)

Note: Models will be automatically downloaded on first use

🎬 Video Generation Examples

Web Interface Usage:

Enter character description: "A friendly news anchor delivering breaking news"
Provide speech text: "Good evening, this is your news update"
Select voice profile: Choose from available options
Generate: Click to create your avatar video

Expected Output:

Format: MP4 video file
Resolution: 480p (854x480)
Frame Rate: 25fps
Duration: Matches audio length (up to 30 seconds)
Features: Lip-sync, body animation, realistic movements

🎯 Prompt Engineering for Videos

Effective Prompt Structure:

[Character Description] + [Behavior/Action] + [Setting/Context]

Examples:

"A professional doctor explaining medical procedures with gentle hand gestures - white coat - modern clinic"
"An energetic fitness instructor demonstrating exercises - athletic wear - gym environment"
"A calm therapist providing advice with empathetic expressions - cozy office setting"

Tips for Better Videos:

Be specific about appearance - clothing, hair, age, etc.
Include desired actions - gesturing, pointing, demonstrating
Specify the setting - office, classroom, studio, outdoor
Mention emotion/tone - confident, friendly, professional, energetic

⚙️ Configuration

Video Quality Settings:

Guidance Scale: Controls prompt adherence (4-6 recommended)
Audio Scale: Controls lip-sync strength (3-5 recommended)
Steps: Quality vs speed trade-off (20-50 steps)

Performance:

GPU Accelerated: Optimized for A10G hardware
Generation Time: ~30-60 seconds per video
Quality: Professional 480p output with smooth animation

🔧 Technical Details

Model Architecture:

Base: Wan2.1-T2V-14B for text-to-video generation
Avatar: OmniAvatar-14B LoRA weights for character animation
Audio: wav2vec2-base-960h for speech feature extraction

Capabilities:

Audio-driven facial animation with precise lip-sync
Adaptive body gestures based on speech content
Character consistency with reference images
High-quality 480p video output at 25fps

💡 Important Notes

This is a VIDEO Generation Application:

🎬 Primary Output: MP4 avatar videos with animation
🎤 Audio Input: Text-to-speech or direct audio files
🎯 Core Feature: Adaptive body animation synchronized with speech
✨ Advanced: Reference image support for character consistency

🔗 References

OmniAvatar Paper: arXiv:2506.18866
Model Hub: OmniAvatar/OmniAvatar-14B
Base Model: Wan-AI/Wan2.1-T2V-14B

🎬 This application creates AVATAR VIDEOS with adaptive body animation - professional quality video generation!