Diffusers
Safetensors
SRPO

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

     
Xiangwei Shen1,2*, Zhimin Li1*, Zhantao Yang1, Shiyi Zhang3, Yingfang Zhang1, Donghao Li1,
Chunyu Wang1, Qinglin Lu1, Yansong Tang3,✝
1Hunyuan, Tencent 
2School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 
3Shenzhen International Graduate School, Tsinghua University 
*Equal contribution  Corresponding author

Abstrat

Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.

Quick Started

Checkpoints

The diffusion_pytorch_model.safetensors is online version of SRPO based on FLUX.1 Dev, trained on HPD dataset with HPSv2

Inference

Replace the diffusion_pytorch_model.safetensors of FLUX

from diffusers import FluxPipeline
prompt='The Death of Ophelia by John Everett Millais, Pre-Raphaelite painting, Ophelia floating in a river surrounded by flowers, detailed natural elements, melancholic and tragic atmosphere'
pipe = FluxPipeline.from_pretrained('./data/flux',
        torch_dtype=torch.bfloat16,
        use_safetensors=True
    ).to("cuda")
state_dict = load_file("./srpo/diffusion_pytorch_model.safetensors")
pipe.transformer.load_state_dict(state_dict)
image = pipe(
    prompt,
    guidance_scale=3.5,
    height=1024,
    width=1024,
    num_inference_steps=infer_step,
    max_sequence_length=512,
    generator=generator
).images[0]

License

SRPO is licensed under the License Terms of SRPO. See ./License.txt for more details.

Citation

If you use SRPO for your research, please cite our paper:

@misc{shen2025directlyaligningdiffusiontrajectory,
      title={Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference}, 
      author={Xiangwei Shen and Zhimin Li and Zhantao Yang and Shiyi Zhang and Yingfang Zhang and Donghao Li and Chunyu Wang and Qinglin Lu and Yansong Tang},
      year={2025},
      eprint={2509.06942},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.06942}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support