LTX-Video 13B Generation Pipeline (Ayu Tsukimiya)

This repository contains the workflow for generating animated videos featuring Ayu Tsukimiya (an orange-haired winged character) using Lightricks' LTX-Video model with LoRA adaptation. The pipeline supports multiple scene configurations with optimized resolution handling.

Installtion

sudo apt-get update && sudo apt-get install ffmpeg git-lfs cbm

pip install -U diffusers transformers torch sentencepiece peft moviepy protobuf
pip install git+https://github.com/Lightricks/LTX-Video.git
pip install git+https://github.com/huggingface/diffusers.git

import torch
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
from diffusers.utils import export_to_video

# Initialize pipelines with Ayu Tsukimiya LoRA

pipe = LTXConditionPipeline.from_pretrained(
    "Lightricks/LTX-Video-0.9.7-dev",
    torch_dtype=torch.bfloat16
)
pipe.load_lora_weights("LTXV_13B_097_DEV_Ayu_Tsukimiya_lora/lora_weights_step_25000.safetensors")
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained(
    "Lightricks/ltxv-spatial-upscaler-0.9.7",
    vae=pipe.vae,
    torch_dtype=torch.bfloat16
)

# Memory optimization (uncomment for GPU acceleration)

#pipe.to("cuda")

#pipe_upsample.to("cuda")

pipe.enable_sequential_cpu_offload()
pipe_upsample.enable_sequential_cpu_offload()

def generate_ayu_video(prompt, output_name):
    """Complete generation pipeline for 832x480 videos"""
    
    # Fixed resolution parameters
    expected_width, expected_height = 832, 480
    downscale_factor = 2/3
    num_frames = 121  # ~5 seconds at 24fps
    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
    
    # 1. Calculate compatible low-res dimensions
    def round_resolution(h, w):
        ratio = pipe.vae_spatial_compression_ratio
        return h - (h % ratio), w - (w % ratio)
    
    low_res_h, low_res_w = round_resolution(
        int(expected_height * downscale_factor),
        int(expected_width * downscale_factor)
    )
    
    # 2. Initial generation at low resolution
    latents = pipe(
        conditions=None,
        prompt=prompt,
        negative_prompt=negative_prompt,
        width=low_res_w,
        height=low_res_h,
        num_frames=num_frames,
        num_inference_steps=30,
        generator=torch.Generator().manual_seed(0),
        output_type="latent",
    ).frames

    # 3. Latent upscaling (2x)
    upscaled_latents = pipe_upsample(
        latents=latents,
        output_type="latent"
    ).frames

    # 4. Quality refinement
    video = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        width=low_res_w*2,  # 2x upscaled
        height=low_res_h*2,
        num_frames=num_frames,
        denoise_strength=0.4,  # 4/10 steps
        num_inference_steps=10,
        latents=upscaled_latents,
        decode_timestep=0.05,
        image_cond_noise_scale=0.025,
        generator=torch.Generator().manual_seed(0),
        output_type="pil",
    ).frames[0]

    # 5. Final resize to 832x480
    video = [frame.resize((expected_width, expected_height)) for frame in video]
    export_to_video(video, f"{output_name}.mp4", fps=24)

Example Generations (All 832x480)

Snowy Tiled Floor Scene

generate_ayu_video(
    prompt="An animated character with orange hair and wings, smiling warmly while standing on a tiled floor with a snowy background, suggesting a cheerful and inviting atmosphere.",
    output_name="ayu_snowy_scene"
)

Ocean Cliff Sunset

generate_ayu_video(
    prompt="An animated character with orange hair and wings, smiling warmly while standing on a cliff overlooking a vast ocean at sunset. The sky is painted in warm hues of pink, orange, and gold, with gentle waves crashing against the rocks below.",
    output_name="ayu_ocean_sunset"
)

Autumn City Park

generate_ayu_video(
    prompt="An animated character with orange hair and wings, smiling warmly while sitting on a wooden bench in a bustling city park during autumn. Golden leaves gently fall around, and people stroll by enjoying the mild weather.",
    output_name="ayu_autumn_park"
)

Technical Specifications

Resolution Handling Stage Resolution Notes

Initial Generation ~554x320 2/3 of target, VAE-compatible Latent Upscaling ~1108x640 2x upscaled Final Output 832x480 Fixed 16:9 aspect