Escoffier Text-to-Video Generation
This repository contains the necessary steps and scripts to generate anime-style videos using the Escoffier text-to-video model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality anime-style videos featuring elegant female characters in fantasy settings with vibrant colors and intricate details.
Prerequisites
Before proceeding, ensure that you have the following installed on your system:
• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg
Installation
Update and Install Dependencies
sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
Clone the Repository
git clone https://huggingface.co/svjack/Escoffier_wan_2_1_1_3_B_text2video_lora cd Escoffier_wan_2_1_1_3_B_text2video_lora
Install Python Dependencies
pip install torch torchvision pip install -r requirements.txt pip install ascii-magic matplotlib tensorboard huggingface_hub datasets pip install moviepy==1.0.3 pip install sageattention==1.0.6
Download Model Weights
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
Usage
To generate a video, use the wan_generate_video.py
script with the appropriate parameters. Below are examples demonstrating the Escoffier aesthetic:
Stand Scene
python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Escoffier_w1_3_outputs/Escoffier_w1_3_lora-000050.safetensors \
--lora_multiplier 1.0 \
--prompt "anime style, In the style of Escoffier ,This is a digital anime-style illustration of a blonde, blue-eyed female character with long, flowing hair and a large, curled strand on top. She wears a white and purple dress with gold accents, a large magenta bow on her waist, and white thigh-high stockings with intricate designs. She has a white frilled hat with a pink ribbon. The background features glowing, crystal-like structures and a dark blue, starry sky. Her expression is gentle, and she holds up the hem of her skirt with her right hand. The overall style is vibrant and dynamic, with a focus on her detailed, fantasy-inspired outfit and the magical, ethereal setting."
Mystical Garden Scene
python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Escoffier_w1_3_outputs/Escoffier_w1_3_lora-000050.safetensors \
--lora_multiplier 1.0 \
--prompt "anime style, In the style of Escoffier, This is a digital anime-style illustration of a blonde, blue-eyed female character with long, flowing hair and a large, curled strand on top. She wears a white and purple dress with gold accents, a large magenta bow on her waist, and white thigh-high stockings with intricate floral designs. She stands gracefully in a mystical garden filled with floating crystal butterflies and glowing lilies, reaching out to touch a shimmering orb."
Interactive Mode
For experimenting with different prompts:
python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Escoffier_w1_3_outputs/Escoffier_w1_3_lora-000050.safetensors \
--lora_multiplier 1.0 \
--interactive
Key Parameters
--fp8
: Enable FP8 precision (recommended)--task
: Model version (t2v-1.3B
)--video_size
: Output resolution (e.g.,480 832
)--video_length
: Number of frames (typically 81)--infer_steps
: Quality vs speed trade-off (35-50)--lora_weight
: Path to Escoffier LoRA weights--lora_multiplier
: Strength of LoRA effect (1.0 recommended)--prompt
: Should include "In the style of Escoffier" for best results
Style Characteristics
For optimal results, prompts should describe:
- Elegant female characters with blonde hair and blue eyes
- Detailed fantasy outfits with bows, ribbons and embroidery
- Magical settings like gardens, ballrooms or celestial spaces
- Pastel color palettes with gold and purple accents
- Graceful poses and serene expressions
Output
Generated videos and frames will be saved in the specified save_path
directory with:
- MP4 video file
- Individual frames as PNG images
Troubleshooting
• Verify all model weights are correctly downloaded • Ensure sufficient GPU memory (>=12GB recommended) • Check for version conflicts in Python packages
License
This project is licensed under the MIT License.
Acknowledgments
• Hugging Face for model hosting • Wan-AI for base models • svjack for LoRA adaptation
For support, please open an issue in the repository.