Spaces:
Running
Running
metadata
title: OmniAvatar-14B Video Generation
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
suggested_hardware: a10g-small
suggested_storage: large
short_description: Avatar video generation with adaptive body animation
models:
- OmniAvatar/OmniAvatar-14B
- Wan-AI/Wan2.1-T2V-14B
- facebook/wav2vec2-base-960h
tags:
- avatar-generation
- video-generation
- text-to-video
- audio-driven-animation
- lip-sync
- body-animation
preload_from_hub:
- OmniAvatar/OmniAvatar-14B
- facebook/wav2vec2-base-960h
π¬ OmniAvatar-14B: Avatar Video Generation with Adaptive Body Animation
This is a VIDEO GENERATION application that creates animated avatar videos, not just audio!
π― What This Application Does
PRIMARY FUNCTION: Avatar Video Generation
- β Generates 480p MP4 videos of animated avatars
- β Audio-driven lip-sync with precise mouth movements
- β Adaptive body animation that responds to speech content
- β Reference image support for character consistency
- β Prompt-controlled behavior for specific actions and expressions
Input β Output:
Text Prompt + Audio/TTS β MP4 Avatar Video (480p, 25fps)
Example:
- Input: "A professional teacher explaining mathematics" + "Hello students, today we'll learn calculus"
- Output: MP4 video of an avatar teacher with lip-sync and teaching gestures
π Quick Start - Video Generation
1. Generate Avatar Videos
- Web Interface: Use the Gradio interface above
- API Endpoint: Available at
/generate
2. Model Requirements
This application requires large models (~30GB) for video generation:
- Wan2.1-T2V-14B: Base text-to-video model (~28GB)
- OmniAvatar-14B: Avatar animation weights (~2GB)
- wav2vec2-base-960h: Audio encoder (~360MB)
Note: Models will be automatically downloaded on first use
π¬ Video Generation Examples
Web Interface Usage:
- Enter character description: "A friendly news anchor delivering breaking news"
- Provide speech text: "Good evening, this is your news update"
- Select voice profile: Choose from available options
- Generate: Click to create your avatar video
Expected Output:
- Format: MP4 video file
- Resolution: 480p (854x480)
- Frame Rate: 25fps
- Duration: Matches audio length (up to 30 seconds)
- Features: Lip-sync, body animation, realistic movements
π― Prompt Engineering for Videos
Effective Prompt Structure:
[Character Description] + [Behavior/Action] + [Setting/Context]
Examples:
"A professional doctor explaining medical procedures with gentle hand gestures - white coat - modern clinic""An energetic fitness instructor demonstrating exercises - athletic wear - gym environment""A calm therapist providing advice with empathetic expressions - cozy office setting"
Tips for Better Videos:
- Be specific about appearance - clothing, hair, age, etc.
- Include desired actions - gesturing, pointing, demonstrating
- Specify the setting - office, classroom, studio, outdoor
- Mention emotion/tone - confident, friendly, professional, energetic
βοΈ Configuration
Video Quality Settings:
- Guidance Scale: Controls prompt adherence (4-6 recommended)
- Audio Scale: Controls lip-sync strength (3-5 recommended)
- Steps: Quality vs speed trade-off (20-50 steps)
Performance:
- GPU Accelerated: Optimized for A10G hardware
- Generation Time: ~30-60 seconds per video
- Quality: Professional 480p output with smooth animation
π§ Technical Details
Model Architecture:
- Base: Wan2.1-T2V-14B for text-to-video generation
- Avatar: OmniAvatar-14B LoRA weights for character animation
- Audio: wav2vec2-base-960h for speech feature extraction
Capabilities:
- Audio-driven facial animation with precise lip-sync
- Adaptive body gestures based on speech content
- Character consistency with reference images
- High-quality 480p video output at 25fps
π‘ Important Notes
This is a VIDEO Generation Application:
- π¬ Primary Output: MP4 avatar videos with animation
- π€ Audio Input: Text-to-speech or direct audio files
- π― Core Feature: Adaptive body animation synchronized with speech
- β¨ Advanced: Reference image support for character consistency
π References
- OmniAvatar Paper: arXiv:2506.18866
- Model Hub: OmniAvatar/OmniAvatar-14B
- Base Model: Wan-AI/Wan2.1-T2V-14B
π¬ This application creates AVATAR VIDEOS with adaptive body animation - professional quality video generation!