Model Description

This model is part of the research work described in "FeatureFusion: Merging Diffusion Models Through Representation Correlations" by Murdock Aubry and James Bona-Landry.

<h1>
Model Description
</h1>

<h2>Overview</h2>
This model is a food specialist based on the Stable Diffusion 1.4 architecture. 

<br>
<h2>Model Details</h2>

Base Model: CompVis/stable-diffusion-v1-4
Type: Specialist
Specialization: Food
Training Data: Food shard
Model Architecture: UNet-based diffusion model

<h2>Limitations</h2>

The model has the same limitations as the base Stable Diffusion model
Best performance is achieved when prompts relate to the model's specialization
May produce unexpected results for concepts outside its training distribution


<h1>Training</h1>

<h2>Training Procedure</h2>

Training Data: Pick-a-Pic v1
Training Method: Finetuning of the UNet component while keeping text encoder and VAE frozen

<h2>Hyperparameters:</h2>

Optimizer: AdamW
Learning rate: 1e-6
Schedule: Cosine with warmup
Training steps: 5 epochs on 1000 data samples
Memory optimization: Gradient accumulation (4 steps), attention slicing, VAE slicing, gradient checkpointing

<h1>Citation</h1>

If you use this model in your research, please cite:<br>
@article{aubry2024featurefusion,<br>
title={FeatureFusion: Merging Diffusion Models Through Representation Correlations},<br>
author={Aubry, Murdock and Bona-Landry, James},<br>
journal={},<br>
year={2025}<br>
}


---
license: mit
language:
- en
base_model:
- CompVis/stable-diffusion-v1-4
pipeline_tag: text-to-image
---