bghira commited on
Commit
546efed
·
verified ·
1 Parent(s): e16db7c

Model card auto-generated by SimpleTuner

Browse files
Files changed (1) hide show
  1. README.md +139 -0
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ base_model: "terminusresearch/pixart-900m-1024-ft-v0.6"
4
+ tags:
5
+ - pixart_sigma
6
+ - pixart_sigma-diffusers
7
+ - text-to-image
8
+ - image-to-image
9
+ - diffusers
10
+ - simpletuner
11
+ - not-for-all-audiences
12
+ - lora
13
+ - controlnet
14
+ - template:sd-lora
15
+ - standard
16
+ pipeline_tag: text-to-image
17
+ inference: true
18
+ widget:
19
+ - text: 'A photo-realistic image of a cat'
20
+ parameters:
21
+ negative_prompt: 'ugly, cropped, blurry, low-quality, mediocre average'
22
+ output:
23
+ url: ./assets/image_0_0.png
24
+ ---
25
+
26
+ # pixart-controlnet-lora-test
27
+
28
+ This is a ControlNet PEFT LoRA derived from [terminusresearch/pixart-900m-1024-ft-v0.6](https://huggingface.co/terminusresearch/pixart-900m-1024-ft-v0.6).
29
+
30
+ The main validation prompt used during training was:
31
+ ```
32
+ A photo-realistic image of a cat
33
+ ```
34
+
35
+
36
+ ## Validation settings
37
+ - CFG: `4.0`
38
+ - CFG Rescale: `0.0`
39
+ - Steps: `16`
40
+ - Sampler: `ddim`
41
+ - Seed: `42`
42
+ - Resolution: `1024x1024`
43
+
44
+
45
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
46
+
47
+ You can find some example images in the following gallery:
48
+
49
+
50
+ <Gallery />
51
+
52
+ The text encoder **was not** trained.
53
+ You may reuse the base model text encoder for inference.
54
+
55
+
56
+ ## Training settings
57
+
58
+ - Training epochs: 224
59
+ - Training steps: 450
60
+ - Learning rate: 0.0001
61
+ - Learning rate schedule: constant
62
+ - Warmup steps: 500
63
+ - Max grad value: 0.01
64
+ - Effective batch size: 3
65
+ - Micro-batch size: 1
66
+ - Gradient accumulation steps: 1
67
+ - Number of GPUs: 3
68
+ - Gradient checkpointing: False
69
+ - Prediction type: epsilon (extra parameters=['training_scheduler_timestep_spacing=trailing', 'inference_scheduler_timestep_spacing=trailing', 'controlnet_enabled'])
70
+ - Optimizer: adamw_bf16
71
+ - Trainable parameter precision: Pure BF16
72
+ - Base model precision: `no_change`
73
+ - Caption dropout probability: 0.0%
74
+
75
+
76
+ - LoRA Rank: 64
77
+ - LoRA Alpha: 64.0
78
+ - LoRA Dropout: 0.1
79
+ - LoRA initialisation style: default
80
+
81
+
82
+ ## Datasets
83
+
84
+ ### antelope-data-1024
85
+ - Repeats: 0
86
+ - Total number of images: ~6
87
+ - Total number of aspect buckets: 1
88
+ - Resolution: 1.048576 megapixels
89
+ - Cropped: True
90
+ - Crop style: center
91
+ - Crop aspect: square
92
+ - Used for regularisation data: No
93
+
94
+
95
+ ## Inference
96
+
97
+
98
+ ```python
99
+ import torch
100
+ from diffusers import PixArtSigmaPipeline, PixArtSigmaControlNetPipeline
101
+ # if you're not in the SimpleTuner environment, this import will fail.
102
+ from helpers.models.pixart.controlnet import PixArtSigmaControlNetAdapterModel
103
+
104
+ # Load base model
105
+ base_model_id = "terminusresearch/pixart-900m-1024-ft-v0.6"
106
+ controlnet_id = "bghira/pixart-controlnet-lora-test"
107
+
108
+ # Load ControlNet adapter
109
+ controlnet = PixArtSigmaControlNetAdapterModel.from_pretrained(
110
+ f"{controlnet_id}/controlnet"
111
+ )
112
+
113
+ # Create pipeline
114
+ pipeline = PixArtSigmaControlNetPipeline.from_pretrained(
115
+ base_model_id,
116
+ controlnet=controlnet,
117
+ torch_dtype=torch.bfloat16
118
+ )
119
+ pipeline.to('cuda' if torch.cuda.is_available() else 'cpu')
120
+
121
+ # Load your control image
122
+ from PIL import Image
123
+ control_image = Image.open("path/to/control/image.png")
124
+
125
+ # Generate
126
+ prompt = "A photo-realistic image of a cat"
127
+ image = pipeline(
128
+ prompt=prompt,
129
+ image=control_image,
130
+ num_inference_steps=16,
131
+ guidance_scale=4.0,
132
+ generator=torch.Generator(device='cuda').manual_seed(42),
133
+ controlnet_conditioning_scale=1.0,
134
+ ).images[0]
135
+
136
+ image.save("output.png")
137
+
138
+
139
+