YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

3D Chibi Text-to-Image (14B) Generation

This repository contains the necessary steps and scripts to generate 3D chibi-style images using the Wan2.1-T2I-14B text-to-image model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality 3D chibi-style illustrations based on textual prompts, emphasizing vibrant aesthetics, character expressions, and dynamic scenes.

🚀 This readme use text-to-image (t2i) generation to allow faster testing while maintaining compatibility with future text-to-video (t2v) workflows.


Prerequisites

Before proceeding, ensure that you have the following installed on your system:

  • Ubuntu (or a compatible Linux distribution)
  • Python 3.x
  • pip (Python package manager)
  • Git
  • Git LFS (Git Large File Storage)

Installation

  1. Update and Install Dependencies

    sudo apt-get update && sudo apt-get install build-essential git-lfs
    
  2. Clone the Repository

    ⚠️ Note: You can use any existing Wan2.1-compatible repo structure or clone directly from Hugging Face.

    git clone https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora
    cd 3D_Chibi_wan_2_1_14_B_text2video_lora
    
  3. Install Python Dependencies

    pip install torch torchvision
    pip install -r requirements.txt
    pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
    pip install sageattention==1.0.6
    
  4. Download Model Weights

    📌 Note: You can view previous results in the respective repositories:

    # Base Models
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
    
    # LoRA Weights
    wget https://huggingface.co/svjack/Xiang_Handsome_wan_2_1_14_B_text2video_lora/resolve/main/Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors
    wget https://huggingface.co/svjack/Taiga_Aisaka_wan_2_1_14_B_text2video_lora/resolve/main/Taiga_Aisaka_w14_outputs/Taiga_Aisaka_w14_lora-000010.safetensors
    wget https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora/resolve/main/Sebastian_Michaelis_w14_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors
    wget https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora/resolve/main/3D_Chibi_w14_outputs/3D_Chibi_w14_lora-000024.safetensors
    

Usage

To generate an image, use the wan_generate_video.py script with the --task t2i-14B parameter.

Example 1: Xiang InfiniteYou Handsome Style

python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

"3D Chibi Style ,In the style of Xiang InfiniteYou Handsome, Xiang, a young person with short, black hair and glasses, stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."

-- without 3D_Chibi lora text2video output

-- with 3D_Chibi lora text2image output

-- with 3D_Chibi lora text2video output


Example 2: Taiga Aisaka Style

python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Taiga_Aisaka_outputs/Taiga_Aisaka_w14_lora-000010.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

"3D Chibi Style, 一个身穿红色高中校服的金发女孩,正在吃汉堡。"

-- without 3D_Chibi lora text2video output

-- with 3D_Chibi lora text2image output


Example 3: Sebastian Michaelis (Black Butler) Style

python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Sebastian_Michaelis_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

"3D Chibi Style, In the style of Black Butler , The video opens with a close-up of a character dressed in a black suit, white shirt, and black tie. stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."

-- without 3D_Chibi lora text2video output

-- with 3D_Chibi lora text2image output


Key Parameters

Parameter Description
--fp8 Enable FP8 precision for improved performance
--task Set to t2i-14B for image generation
--video_size Output resolution (e.g., 480 832)
--infer_steps Speed vs quality trade-off (20 recommended for quick test)
--lora_weight Path to LoRA weight files (can specify multiple)
--lora_multiplier Strength of LoRA effect (default: 1.0)
--prompt Include "3D Chibi Style" for best results

Style Characteristics

For optimal results, prompts should emphasize:

  • Chibi-style characters with exaggerated heads and facial expressions
  • Vibrant colors and dynamic lighting effects
  • Fantasy or magical settings (e.g., gardens, castles, floating islands)
  • Neon or glowing elements, especially in futuristic or energetic scenes

Output

Generated images will be saved in the specified --save_path directory with:

  • PNG image file
  • (Optional) MP4 video (if --output_type both is used)

Troubleshooting

  • Ensure all model weights are correctly downloaded and placed in the right directories.
  • Check GPU memory availability; at least 20GB VRAM is recommended for 14B models.
  • Verify no conflicts exist between Python packages using pip check.

License

This project is licensed under the MIT License.


Acknowledgments

  • Hugging Face – For hosting the model and dataset repositories
  • Wan-AI – For providing base diffusion models
  • svjack – For adapting and sharing LoRA weights for various styles

For support or feedback, please open an issue in this repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support