AI_Avatar_Chat / README.md
bravedims
πŸ“ Fix short_description length for HuggingFace Spaces validation
f4f48b3
|
raw
history blame
4.9 kB
metadata
title: OmniAvatar-14B Video Generation
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
suggested_hardware: a10g-small
suggested_storage: large
short_description: Avatar video generation with adaptive body animation
models:
  - OmniAvatar/OmniAvatar-14B
  - Wan-AI/Wan2.1-T2V-14B
  - facebook/wav2vec2-base-960h
tags:
  - avatar-generation
  - video-generation
  - text-to-video
  - audio-driven-animation
  - lip-sync
  - body-animation
preload_from_hub:
  - OmniAvatar/OmniAvatar-14B
  - facebook/wav2vec2-base-960h

🎬 OmniAvatar-14B: Avatar Video Generation with Adaptive Body Animation

This is a VIDEO GENERATION application that creates animated avatar videos, not just audio!

🎯 What This Application Does

PRIMARY FUNCTION: Avatar Video Generation

  • βœ… Generates 480p MP4 videos of animated avatars
  • βœ… Audio-driven lip-sync with precise mouth movements
  • βœ… Adaptive body animation that responds to speech content
  • βœ… Reference image support for character consistency
  • βœ… Prompt-controlled behavior for specific actions and expressions

Input β†’ Output:

Text Prompt + Audio/TTS β†’ MP4 Avatar Video (480p, 25fps)

Example:

  • Input: "A professional teacher explaining mathematics" + "Hello students, today we'll learn calculus"
  • Output: MP4 video of an avatar teacher with lip-sync and teaching gestures

πŸš€ Quick Start - Video Generation

1. Generate Avatar Videos

  • Web Interface: Use the Gradio interface above
  • API Endpoint: Available at /generate

2. Model Requirements

This application requires large models (~30GB) for video generation:

  • Wan2.1-T2V-14B: Base text-to-video model (~28GB)
  • OmniAvatar-14B: Avatar animation weights (~2GB)
  • wav2vec2-base-960h: Audio encoder (~360MB)

Note: Models will be automatically downloaded on first use

🎬 Video Generation Examples

Web Interface Usage:

  1. Enter character description: "A friendly news anchor delivering breaking news"
  2. Provide speech text: "Good evening, this is your news update"
  3. Select voice profile: Choose from available options
  4. Generate: Click to create your avatar video

Expected Output:

  • Format: MP4 video file
  • Resolution: 480p (854x480)
  • Frame Rate: 25fps
  • Duration: Matches audio length (up to 30 seconds)
  • Features: Lip-sync, body animation, realistic movements

🎯 Prompt Engineering for Videos

Effective Prompt Structure:

[Character Description] + [Behavior/Action] + [Setting/Context]

Examples:

  • "A professional doctor explaining medical procedures with gentle hand gestures - white coat - modern clinic"
  • "An energetic fitness instructor demonstrating exercises - athletic wear - gym environment"
  • "A calm therapist providing advice with empathetic expressions - cozy office setting"

Tips for Better Videos:

  1. Be specific about appearance - clothing, hair, age, etc.
  2. Include desired actions - gesturing, pointing, demonstrating
  3. Specify the setting - office, classroom, studio, outdoor
  4. Mention emotion/tone - confident, friendly, professional, energetic

βš™οΈ Configuration

Video Quality Settings:

  • Guidance Scale: Controls prompt adherence (4-6 recommended)
  • Audio Scale: Controls lip-sync strength (3-5 recommended)
  • Steps: Quality vs speed trade-off (20-50 steps)

Performance:

  • GPU Accelerated: Optimized for A10G hardware
  • Generation Time: ~30-60 seconds per video
  • Quality: Professional 480p output with smooth animation

πŸ”§ Technical Details

Model Architecture:

  • Base: Wan2.1-T2V-14B for text-to-video generation
  • Avatar: OmniAvatar-14B LoRA weights for character animation
  • Audio: wav2vec2-base-960h for speech feature extraction

Capabilities:

  • Audio-driven facial animation with precise lip-sync
  • Adaptive body gestures based on speech content
  • Character consistency with reference images
  • High-quality 480p video output at 25fps

πŸ’‘ Important Notes

This is a VIDEO Generation Application:

  • 🎬 Primary Output: MP4 avatar videos with animation
  • 🎀 Audio Input: Text-to-speech or direct audio files
  • 🎯 Core Feature: Adaptive body animation synchronized with speech
  • ✨ Advanced: Reference image support for character consistency

πŸ”— References


🎬 This application creates AVATAR VIDEOS with adaptive body animation - professional quality video generation!