Text-to-Image
Diffusers
Safetensors
English
Edit model card

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation (https://huggingface.co/papers/2402.10210)

image/png

SPIN-Diffusion-iter2

This model is a self-play fine-tuned diffusion model at iteration 2 from runwayml/stable-diffusion-v1-5 using synthetic data based on the winner images of the yuvalkirstain/pickapic_v2 dataset. We have also made a Gradio Demo at UCLA-AGI/SPIN-Diffusion-demo-v1.

Model Details

Model Description

  • Model type: An diffusion model with unet fine-tuned, based on the strucure of stable diffusion 1.5
  • Language(s) (NLP): Primarily English
  • License: Apache-2.0
  • Finetuned from model: runwayml/stable-diffusion-v1-5

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2.0e-05
  • train_batch_size: 8
  • distributed_type: multi-GPU
  • num_devices: 8
  • train_gradient_accumulation_steps: 32
  • total_train_batch_size: 2048
  • optimizer: AdamW
  • lr_scheduler: "linear"
  • lr_warmup_steps: 200
  • num_training_steps: 500

Citation

@misc{yuan2024self,
      title={Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation}, 
      author={Yuan, Huizhuo and Chen, Zixiang and Ji, Kaixuan and Gu, Quanquan},
      year={2024},
      eprint={2402.10210},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train UCLA-AGI/SPIN-Diffusion-iter2

Collection including UCLA-AGI/SPIN-Diffusion-iter2