bghira commited on
Commit
e9bf10f
·
verified ·
1 Parent(s): bc94ddd

Model card auto-generated by SimpleTuner

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
4
+ tags:
5
+ - WanPipeline
6
+ - WanPipeline-diffusers
7
+ - text-to-image
8
+ - image-to-image
9
+ - diffusers
10
+ - simpletuner
11
+ - not-for-all-audiences
12
+ - lora
13
+ - template:sd-lora
14
+ - standard
15
+ pipeline_tag: text-to-image
16
+ inference: true
17
+ widget:
18
+ - text: 'A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow''s posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color.'
19
+ parameters:
20
+ negative_prompt: '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走'
21
+ output:
22
+ url: ./assets/image_0_0.gif
23
+ ---
24
+
25
+ # wan-disney-DCM-distilled
26
+
27
+ This is a DCM-distilled PEFT LoRA derived from [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers).
28
+
29
+ The main validation prompt used during training was:
30
+ ```
31
+ A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow's posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color.
32
+ ```
33
+
34
+
35
+ ## Validation settings
36
+ - CFG: `1.0`
37
+ - CFG Rescale: `0.0`
38
+ - Steps: `8`
39
+ - Sampler: `FlowMatchEulerDiscreteScheduler`
40
+ - Seed: `42`
41
+ - Resolution: `832x480`
42
+
43
+
44
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
45
+
46
+ You can find some example images in the following gallery:
47
+
48
+
49
+ <Gallery />
50
+
51
+ The text encoder **was not** trained.
52
+ You may reuse the base model text encoder for inference.
53
+
54
+
55
+ ## Training settings
56
+
57
+ - Training epochs: 0
58
+ - Training steps: 100
59
+ - Learning rate: 5e-05
60
+ - Learning rate schedule: cosine
61
+ - Warmup steps: 400000
62
+ - Max grad value: 0.01
63
+ - Effective batch size: 2
64
+ - Micro-batch size: 2
65
+ - Gradient accumulation steps: 1
66
+ - Number of GPUs: 1
67
+ - Gradient checkpointing: True
68
+ - Prediction type: flow_matching (extra parameters=['shift=17.0'])
69
+ - Optimizer: adamw_bf16
70
+ - Trainable parameter precision: Pure BF16
71
+ - Base model precision: `int8-quanto`
72
+ - Caption dropout probability: 0.1%
73
+
74
+
75
+ - LoRA Rank: 128
76
+ - LoRA Alpha: 128.0
77
+ - LoRA Dropout: 0.1
78
+ - LoRA initialisation style: default
79
+
80
+
81
+ ## Datasets
82
+
83
+ ### disney-black-and-white-wan
84
+ - Repeats: 10
85
+ - Total number of images: 68
86
+ - Total number of aspect buckets: 1
87
+ - Resolution: 0.2304 megapixels
88
+ - Cropped: False
89
+ - Crop style: None
90
+ - Crop aspect: None
91
+ - Used for regularisation data: No
92
+
93
+
94
+ ## Inference
95
+
96
+
97
+ ```python
98
+ import torch
99
+ from diffusers import DiffusionPipeline
100
+
101
+ model_id = 'Wan-AI/Wan2.1-T2V-1.3B-Diffusers'
102
+ adapter_id = 'bghira/wan-disney-DCM-distilled'
103
+ pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) # loading directly in bf16
104
+ pipeline.load_lora_weights(adapter_id)
105
+
106
+ prompt = "A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow's posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color."
107
+ negative_prompt = '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走'
108
+
109
+ ## Optional: quantise the model to save on vram.
110
+ ## Note: The model was quantised during training, and so it is recommended to do the same during inference time.
111
+ from optimum.quanto import quantize, freeze, qint8
112
+ quantize(pipeline.transformer, weights=qint8)
113
+ freeze(pipeline.transformer)
114
+
115
+ pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu') # the pipeline is already in its target precision level
116
+ model_output = pipeline(
117
+ prompt=prompt,
118
+ negative_prompt=negative_prompt,
119
+ num_inference_steps=8,
120
+ generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
121
+ width=832,
122
+ height=480,
123
+ guidance_scale=1.0,
124
+ ).images[0]
125
+
126
+ from diffusers.utils.export_utils import export_to_gif
127
+ export_to_gif(model_output, "output.gif", fps=15)
128
+
129
+ ```
130
+
131
+
132
+