CoMPaSS-FLUX.1

[Project Page] [code] [arXiv]

Prompt
a photo of a laptop above a dog
Prompt
a photo of a bird below a skateboard
Prompt
a photo of a horse to the left of a bottle

Model description

CoMPaSS-FLUX.1

A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image diffusion model. This model demonstrates significant improvements in generating images with specific spatial relationships between objects.

Model Details

  • Base Model: FLUX.1-dev
  • LoRA Rank: 16
  • Training Data: SCOP dataset (curated from COCO)
  • File Size: ~50MiB
  • Framework: Diffusers
  • License: Non-Commercial (see ./LICENSE)

Intended Use

  • Generating images with accurate spatial relationships between objects
  • Creating compositions that require specific spatial arrangements
  • Enhancing the base model's spatial understanding while maintaining its other capabilities

Performance

Key Improvements

  • VISOR benchmark: +98% relative improvement
  • T2I-CompBench Spatial: +67% relative improvement
  • GenEval Position: +131% relative improvement
  • Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)

Using the Model

See our GitHub repository to get started.

Effective Prompting

The model works well with:

  • Clear spatial relationship descriptors (left, right, above, below)
  • Pairs of distinct objects
  • Explicit spatial relationships (e.g., "a photo of A to the right of B")

Training Details

Training Data

  • Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
  • ~28,000 curated object pairs from COCO
  • Enforces criteria for:
    • Visual significance
    • Semantic distinction
    • Spatial clarity
    • Object relationships
    • Visual balance

Training Process

  • Trained for 24,000 steps
  • Batch size of 4
  • Learning rate: 1e-4
  • Optimizer: AdamW with β₁=0.9, β₂=0.999
  • Weight decay: 1e-2

Evaluation Results

Metric FLUX.1 +CoMPaSS
VISOR uncond (⬆️) 37.96% 75.17%
T2I-CompBench Spatial (⬆️) 0.18 0.30
GenEval Position (⬆️) 0.26 0.60
FID (⬇️) 27.96 26.40
CMMD (⬇️) 0.8737 0.6859

Citation

If you use this model in your research, please cite:

@inproceedings{zhang2025compass,
  title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
  author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
  booktitle={ICCV},
  year={2025}
}

Contact

For questions about the model, please contact [email protected]

Download model

Weights for this model are available in Safetensors format.

Download them in the Files & versions tab.

Downloads last month
1,120
Inference Providers NEW
Examples

Model tree for blurgy/CoMPaSS-FLUX.1

Adapter
(34733)
this model

Spaces using blurgy/CoMPaSS-FLUX.1 2