Tuned by
https://huggingface.co/datasets/svjack/video-dataset-Lily-Bikini-organized
To test Mochi-1 have ability to learn concept (object or person) in tiny dataset (trained on low resolution)

Installtion

pip install git+https://github.com/huggingface/diffusers.git peft transformers torch sentencepiece opencv-python

Example

LandScape Example

from diffusers import MochiPipeline
from diffusers.utils import export_to_video
import torch

pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview", torch_dtype = torch.float16)
pipe.load_lora_weights("svjack/mochi_Lily_Bikini_early_lora")
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

i = 50
generator = torch.Generator("cpu").manual_seed(i)
prompt = "Lily: The video features a woman with blonde hair wearing a black one-piece swimsuit. She is standing on a sandy beach with the ocean in the background, where waves are visible crashing onto the shore. The sky is clear with a few scattered clouds, suggesting a sunny day. The woman appears to be holding a piece of driftwood or a similar object in her right hand. Her stance and expression suggest she is posing for the camera."
pipeline_args = {
        "prompt": prompt,
        "num_inference_steps": 64,
        "height": 480,
        "width": 848,
        "max_sequence_length": 1024,
        "output_type": "np",
        "num_frames": 19,
        "generator": generator
    }

video = pipe(**pipeline_args).frames[0]
export_to_video(video, "Lily_Lora.mp4")
from IPython import display
display.clear_output(wait = True)
display.Video("Lily_Lora.mp4")

With lora

With lora + Upscale