blurgy commited on
Commit
0b38144
·
verified ·
0 Parent(s):

initial commit

Browse files
Files changed (3) hide show
  1. .gitattributes +55 -0
  2. LICENSE +0 -0
  3. README.md +158 -0
.gitattributes ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.lz4 filter=lfs diff=lfs merge=lfs -text
12
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
13
+ *.model filter=lfs diff=lfs merge=lfs -text
14
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
15
+ *.npy filter=lfs diff=lfs merge=lfs -text
16
+ *.npz filter=lfs diff=lfs merge=lfs -text
17
+ *.onnx filter=lfs diff=lfs merge=lfs -text
18
+ *.ot filter=lfs diff=lfs merge=lfs -text
19
+ *.parquet filter=lfs diff=lfs merge=lfs -text
20
+ *.pb filter=lfs diff=lfs merge=lfs -text
21
+ *.pickle filter=lfs diff=lfs merge=lfs -text
22
+ *.pkl filter=lfs diff=lfs merge=lfs -text
23
+ *.pt filter=lfs diff=lfs merge=lfs -text
24
+ *.pth filter=lfs diff=lfs merge=lfs -text
25
+ *.rar filter=lfs diff=lfs merge=lfs -text
26
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
27
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
29
+ *.tar filter=lfs diff=lfs merge=lfs -text
30
+ *.tflite filter=lfs diff=lfs merge=lfs -text
31
+ *.tgz filter=lfs diff=lfs merge=lfs -text
32
+ *.wasm filter=lfs diff=lfs merge=lfs -text
33
+ *.xz filter=lfs diff=lfs merge=lfs -text
34
+ *.zip filter=lfs diff=lfs merge=lfs -text
35
+ *.zst filter=lfs diff=lfs merge=lfs -text
36
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
37
+ # Audio files - uncompressed
38
+ *.pcm filter=lfs diff=lfs merge=lfs -text
39
+ *.sam filter=lfs diff=lfs merge=lfs -text
40
+ *.raw filter=lfs diff=lfs merge=lfs -text
41
+ # Audio files - compressed
42
+ *.aac filter=lfs diff=lfs merge=lfs -text
43
+ *.flac filter=lfs diff=lfs merge=lfs -text
44
+ *.mp3 filter=lfs diff=lfs merge=lfs -text
45
+ *.ogg filter=lfs diff=lfs merge=lfs -text
46
+ *.wav filter=lfs diff=lfs merge=lfs -text
47
+ # Image files - uncompressed
48
+ *.bmp filter=lfs diff=lfs merge=lfs -text
49
+ *.gif filter=lfs diff=lfs merge=lfs -text
50
+ *.png filter=lfs diff=lfs merge=lfs -text
51
+ *.tiff filter=lfs diff=lfs merge=lfs -text
52
+ # Image files - compressed
53
+ *.jpg filter=lfs diff=lfs merge=lfs -text
54
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
55
+ *.webp filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
File without changes
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-image
4
+ - lora
5
+ - diffusers
6
+ - template:diffusion-lora
7
+ widget:
8
+ - text: A laptop above a dog
9
+ output:
10
+ url: images/laptop-above-dog_flux1_compass_004.jpg
11
+ - text: A bird below a skateboard
12
+ output:
13
+ url: images/flux_compass_bird1.jpg
14
+ - text: A horse to the left of a bottle
15
+ output:
16
+ url: images/horse-left-bottle_flux1_compass_003.jpg
17
+ base_model: black-forest-labs/FLUX.1-dev
18
+ instance_prompt: null
19
+ license: other
20
+ license_name: compass-lora-weights-nc-license
21
+ license_link: LICENSE
22
+ ---
23
+ # CoMPaSS-FLUX.1
24
+
25
+ <Gallery />
26
+
27
+ ## Model description
28
+
29
+ # CoMPaSS-FLUX.1
30
+
31
+ A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image diffusion model. This model demonstrates significant improvements in generating images with specific spatial relationships between objects.
32
+
33
+ ## Model Details
34
+
35
+ - **Base Model**: FLUX.1-dev
36
+ - **LoRA Rank**: 16
37
+ - **Training Data**: SCOP dataset (curated from COCO)
38
+ - **File Size**: ~50MiB
39
+ - **Framework**: Diffusers
40
+ - **License**: Non-Commercial (see LICENSE.md)
41
+
42
+ ## Intended Use
43
+
44
+ - Generating images with accurate spatial relationships between objects
45
+ - Creating compositions that require specific spatial arrangements
46
+ - Enhancing the base model&#39;s spatial understanding while maintaining its other capabilities
47
+
48
+ ## Performance
49
+
50
+ ### Key Improvements
51
+
52
+ - VISOR benchmark: +98% relative improvement
53
+ - T2I-CompBench Spatial: +67% relative improvement
54
+ - GenEval Position: +131% relative improvement
55
+ - Maintains or improves base model&#39;s image fidelity (FID and CMMD scores)
56
+
57
+ ## Using the Model
58
+
59
+ ### Installation
60
+
61
+ &#x60;&#x60;&#x60;python
62
+ from diffusers import DiffusionPipeline
63
+ import torch
64
+ from safetensors.torch import load_file
65
+
66
+ # Load base model
67
+ pipe &#x3D; DiffusionPipeline.from_pretrained(
68
+ &quot;black-forest-labs&#x2F;FLUX.1-dev&quot;,
69
+ torch_dtype&#x3D;torch.float16,
70
+ variant&#x3D;&quot;fp16&quot;
71
+ ).to(&quot;cuda&quot;)
72
+
73
+ # Load and apply LoRA weights
74
+ lora_path &#x3D; &quot;path_to_compass_lora.safetensors&quot;
75
+ state_dict &#x3D; load_file(lora_path)
76
+ pipe.load_lora_weights(state_dict)
77
+ &#x60;&#x60;&#x60;
78
+
79
+ ### Example Usage
80
+
81
+ &#x60;&#x60;&#x60;python
82
+ prompt &#x3D; &quot;A motorcycle to the right of a bear&quot;
83
+ image &#x3D; pipe(prompt).images[0]
84
+ &#x60;&#x60;&#x60;
85
+
86
+ ### Effective Prompting
87
+
88
+ The model works well with:
89
+ - Clear spatial relationship descriptors (left, right, above, below)
90
+ - Pairs of distinct objects
91
+ - Explicit spatial relationships (e.g., &quot;A to the right of B&quot;)
92
+
93
+ ## Training Details
94
+
95
+ ### Training Data
96
+
97
+ - Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
98
+ - ~28,000 curated object pairs from COCO
99
+ - Enforces criteria for:
100
+ - Visual significance
101
+ - Semantic distinction
102
+ - Spatial clarity
103
+ - Object relationships
104
+ - Visual balance
105
+
106
+ ### Training Process
107
+
108
+ - Trained for 24,000 steps
109
+ - Batch size of 4
110
+ - Learning rate: 1e-4
111
+ - Optimizer: AdamW with β₁&#x3D;0.9, β₂&#x3D;0.999
112
+ - Weight decay: 1e-2
113
+
114
+ ## Evaluation Results
115
+
116
+ | Metric | Base FLUX.1 | +CoMPaSS | Relative Improvement |
117
+ |--------|-------------|-----------|-------------------|
118
+ | VISOR uncond | 37.96% | 75.17% | +98% |
119
+ | T2I-CompBench Spatial | 0.18 | 0.30 | +67% |
120
+ | GenEval Position | 0.26 | 0.60 | +131% |
121
+ | FID | 27.96 | 26.40 | +5.6% |
122
+ | CMMD | 0.8737 | 0.6859 | +21.5% |
123
+
124
+ ## Technical Specifications
125
+
126
+ - **Architecture**: MMDiT-based FLUX.1 with LoRA adaptation
127
+ - **LoRA Target**: DoubleStreamBlocks
128
+ - **Parameter Count**: Base model parameters + ~50MiB LoRA weights
129
+ - **Input**: Text prompts (like base FLUX.1)
130
+ - **Output**: 1024×1024 images
131
+ - **Compute Requirements**: Similar to base FLUX.1
132
+
133
+ ## Citation
134
+
135
+ If you use this model in your research, please cite:
136
+ &#x60;&#x60;&#x60;bibtex
137
+ @article{zhang2024compass,
138
+ title&#x3D;{CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
139
+ author&#x3D;{Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
140
+ journal&#x3D;{arXiv preprint arXiv:2412.13195},
141
+ year&#x3D;{2024}
142
+ }
143
+ &#x60;&#x60;&#x60;
144
+
145
+ ## Acknowledgments
146
+
147
+ This work builds upon the FLUX.1 model by Black Forest Labs and utilizes the COCO dataset for training data curation.
148
+
149
+ ## Contact
150
+
151
+ For questions about the model, please contact [email protected]
152
+
153
+
154
+ ## Download model
155
+
156
+ Weights for this model are available in Safetensors format.
157
+
158
+ [Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.