3D Chibi Text-to-Video Generation
This repository contains the necessary steps and scripts to generate anime-style videos using the 3D Chibi text-to-video model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality 3D chibi-style videos based on textual prompts, emphasizing vibrant aesthetics, character expressions, and dynamic scenes.
Prerequisites
Before proceeding, ensure that you have the following installed on your system:
- Ubuntu (or a compatible Linux distribution)
- Python 3.x
- pip (Python package manager)
- Git
- Git LFS (Git Large File Storage)
- FFmpeg
Installation
Update and Install Dependencies
sudo apt-get update && sudo apt-get install build-essential git-lfs ffmpeg
Clone the Repository
git clone https://huggingface.co/svjack/3D_Chibi_wan_2_1_1_3_B_text2video_lora cd 3D_Chibi_wan_2_1_1_3_B_text2video_lora
Install Python Dependencies
pip install torch torchvision pip install -r requirements.txt pip install ascii-magic matplotlib tensorboard huggingface_hub datasets pip install moviepy==1.0.3 pip install sageattention==1.0.6
Download Model Weights
π Note: You can view previous results in the respective repositories:
wget https://huggingface.co/svjack/3D_Chibi_wan_2_1_1_3_B_text2video_lora/resolve/main/3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors wget https://huggingface.co/svjack/Kinich_wan_2_1_1_3_B_text2video_lora/resolve/main/Kinich_w1_3_outputs/Kinich_w1_3_lora-000070.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
Usage
To generate a video, use the wan_generate_video.py
script with the appropriate parameters.
Example 1: Mixed Style with Kinich
python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight 3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors Kinich_w1_3_lora-000070.safetensors \
--lora_multiplier 1.0 \
--interactive
Prompt
"3D Chibi Style, anime style, In the style of Kinich, This is a digital anime-style illustration featuring a young male character with teal and dark blue, tousled hair adorned with geometric, neon-colored patterns. He has large, expressive green eyes and a slight, confident smile. He is wearing a black, form-fitting outfit with gold and teal geometric designs. The background depicts a high-energy action sequence set in a partially destroyed urban landscape. Explosions of glowing energy ripple through the air, and fragments of debris float around him as he levitates slightly, surrounded by swirling particles of light."
-- without 3D_Chibi lora
-- with 3D_Chibi lora
Example 2: Mixed Style with Escoffier
π You can find the Escoffier LoRA weights here: Escoffier_wan_2_1_1_3_B_text2video_lora
python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight 3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors Escoffier_w1_3_outputs/Escoffier_w1_3_lora-000050.safetensors \
--lora_multiplier 1.0 \
--interactive
Prompt
"3D Chibi Style, anime style, In the style of Escoffier, This is a digital anime-style illustration of a blonde, blue-eyed female character with long, flowing hair and a large, curled strand on top. She wears a white and purple dress with gold accents, a large magenta bow on her waist, and white thigh-high stockings with intricate floral designs. She stands gracefully in a mystical garden filled with floating crystal butterflies and glowing lilies, reaching out to touch a shimmering orb."
-- without 3D_Chibi lora
-- with 3D_Chibi lora
Key Parameters
Parameter | Description |
---|---|
--fp8 |
Enable FP8 precision for improved performance |
--task |
Model version (t2v-1.3B ) |
--video_size |
Output resolution (e.g., 480 832 ) |
--video_length |
Number of frames (typically 81) |
--infer_steps |
Trade-off between quality and speed (35β50 recommended) |
--lora_weight |
Path to LoRA weight files (can specify multiple) |
--lora_multiplier |
Strength of LoRA effect (default: 1.0) |
--prompt |
Include style keywords like "In the style of Kinich" for better results |
Style Characteristics
For optimal results, prompts should emphasize:
- Chibi-style characters with exaggerated heads and facial expressions
- Vibrant colors and dynamic lighting effects
- Fantasy or magical settings (e.g., gardens, castles, floating islands)
- Neon or glowing elements, especially in futuristic or energetic scenes
Output
Generated videos and frames will be saved in the specified --save_path
directory, including:
- MP4 video file
- Individual frames as PNG images
Troubleshooting
- Ensure all model weights are correctly downloaded and placed in the right directories.
- Check GPU memory availability; at least 12GB VRAM is recommended.
- Verify no conflicts exist between Python packages using
pip check
.
License
This project is licensed under the MIT License.
Acknowledgments
- Hugging Face β For hosting the model and dataset repositories
- Wan-AI β For providing base diffusion models
- svjack β For adapting and sharing LoRA weights for various styles
For support or feedback, please open an issue in this repository.