3D Chibi Text-to-Video Generation

This repository contains the necessary steps and scripts to generate anime-style videos using the 3D Chibi text-to-video model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality 3D chibi-style videos based on textual prompts, emphasizing vibrant aesthetics, character expressions, and dynamic scenes.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

Ubuntu (or a compatible Linux distribution)
Python 3.x
pip (Python package manager)
Git
Git LFS (Git Large File Storage)
FFmpeg

Installation

Update and Install Dependencies

sudo apt-get update && sudo apt-get install build-essential git-lfs ffmpeg

Clone the Repository

git clone https://huggingface.co/svjack/3D_Chibi_wan_2_1_1_3_B_text2video_lora
cd 3D_Chibi_wan_2_1_1_3_B_text2video_lora

Install Python Dependencies

pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install moviepy==1.0.3
pip install sageattention==1.0.6

Download Model Weights

📌 Note: You can view previous results in the respective repositories:

wget https://huggingface.co/svjack/3D_Chibi_wan_2_1_1_3_B_text2video_lora/resolve/main/3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors
wget https://huggingface.co/svjack/Kinich_wan_2_1_1_3_B_text2video_lora/resolve/main/Kinich_w1_3_outputs/Kinich_w1_3_lora-000070.safetensors
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters.

Example 1: Mixed Style with Kinich

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight 3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors Kinich_w1_3_lora-000070.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

"3D Chibi Style, anime style, In the style of Kinich, This is a digital anime-style illustration featuring a young male character with teal and dark blue, tousled hair adorned with geometric, neon-colored patterns. He has large, expressive green eyes and a slight, confident smile. He is wearing a black, form-fitting outfit with gold and teal geometric designs. The background depicts a high-energy action sequence set in a partially destroyed urban landscape. Explosions of glowing energy ripple through the air, and fragments of debris float around him as he levitates slightly, surrounded by swirling particles of light."

-- without 3D_Chibi lora

-- with 3D_Chibi lora

Example 2: Mixed Style with Escoffier

📌 You can find the Escoffier LoRA weights here: Escoffier_wan_2_1_1_3_B_text2video_lora

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 35 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight 3D_Chibi_outputs/3D_Chibi_w1_3_lora-000065.safetensors Escoffier_w1_3_outputs/Escoffier_w1_3_lora-000050.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

"3D Chibi Style, anime style, In the style of Escoffier, This is a digital anime-style illustration of a blonde, blue-eyed female character with long, flowing hair and a large, curled strand on top. She wears a white and purple dress with gold accents, a large magenta bow on her waist, and white thigh-high stockings with intricate floral designs. She stands gracefully in a mystical garden filled with floating crystal butterflies and glowing lilies, reaching out to touch a shimmering orb."

-- without 3D_Chibi lora

-- with 3D_Chibi lora

Key Parameters

Parameter	Description
`--fp8`	Enable FP8 precision for improved performance
`--task`	Model version (`t2v-1.3B`)
`--video_size`	Output resolution (e.g., `480 832`)
`--video_length`	Number of frames (typically 81)
`--infer_steps`	Trade-off between quality and speed (35–50 recommended)
`--lora_weight`	Path to LoRA weight files (can specify multiple)
`--lora_multiplier`	Strength of LoRA effect (default: 1.0)
`--prompt`	Include style keywords like `"In the style of Kinich"` for better results

Style Characteristics

For optimal results, prompts should emphasize:

Chibi-style characters with exaggerated heads and facial expressions
Vibrant colors and dynamic lighting effects
Fantasy or magical settings (e.g., gardens, castles, floating islands)
Neon or glowing elements, especially in futuristic or energetic scenes

Output

Generated videos and frames will be saved in the specified --save_path directory, including:

MP4 video file
Individual frames as PNG images

Troubleshooting

Ensure all model weights are correctly downloaded and placed in the right directories.
Check GPU memory availability; at least 12GB VRAM is recommended.
Verify no conflicts exist between Python packages using pip check.

License

This project is licensed under the MIT License.

Acknowledgments

Hugging Face – For hosting the model and dataset repositories
Wan-AI – For providing base diffusion models
svjack – For adapting and sharing LoRA weights for various styles

For support or feedback, please open an issue in this repository.