--- license: apache-2.0 base_model: "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" tags: - WanPipeline - WanPipeline-diffusers - text-to-image - image-to-image - diffusers - simpletuner - not-for-all-audiences - lora - template:sd-lora - standard pipeline_tag: text-to-image inference: true widget: - text: 'A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow''s posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color.' parameters: negative_prompt: '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走' output: url: ./assets/image_0_0.gif --- # wan-disney-DCM-distilled This is a standard PEFT LoRA derived from [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers). The main validation prompt used during training was: ``` A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow's posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color. ``` ## Validation settings - CFG: `1.0` - CFG Rescale: `0.0` - Steps: `8` - Sampler: `FlowMatchEulerDiscreteScheduler` - Seed: `42` - Resolution: `832x480` Note: The validation settings are not necessarily the same as the [training settings](#training-settings). You can find some example images in the following gallery: The text encoder **was not** trained. You may reuse the base model text encoder for inference. ## Training settings - Training epochs: 0 - Training steps: 300 - Learning rate: 0.0001 - Learning rate schedule: cosine - Warmup steps: 400000 - Max grad value: 0.01 - Effective batch size: 2 - Micro-batch size: 2 - Gradient accumulation steps: 1 - Number of GPUs: 1 - Gradient checkpointing: True - Prediction type: flow_matching (extra parameters=['shift=17.0']) - Optimizer: adamw_bf16 - Trainable parameter precision: Pure BF16 - Base model precision: `int8-quanto` - Caption dropout probability: 0.1% - LoRA Rank: 128 - LoRA Alpha: 128.0 - LoRA Dropout: 0.1 - LoRA initialisation style: default ## Datasets ### disney-black-and-white-wan - Repeats: 10 - Total number of images: 68 - Total number of aspect buckets: 1 - Resolution: 0.2304 megapixels - Cropped: False - Crop style: None - Crop aspect: None - Used for regularisation data: No ## Inference ```python import torch from diffusers import DiffusionPipeline model_id = 'Wan-AI/Wan2.1-T2V-1.3B-Diffusers' adapter_id = 'bghira/wan-disney-DCM-distilled' pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) # loading directly in bf16 pipeline.load_lora_weights(adapter_id) prompt = "A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow's posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color." negative_prompt = '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走' ## Optional: quantise the model to save on vram. ## Note: The model was quantised during training, and so it is recommended to do the same during inference time. from optimum.quanto import quantize, freeze, qint8 quantize(pipeline.transformer, weights=qint8) freeze(pipeline.transformer) pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu') # the pipeline is already in its target precision level model_output = pipeline( prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=8, generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42), width=832, height=480, guidance_scale=1.0, ).images[0] from diffusers.utils.export_utils import export_to_gif export_to_gif(model_output, "output.gif", fps=15) ```