TiM / README.md
Julien Blanchon
Update
ebe8085

A newer version of the Gradio SDK is available: 5.46.0

Upgrade
metadata
title: TiM
emoji: πŸ†
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
python_version: 3.1

Transition Models: Rethinking the Generative Learning Objective

ZiDong Wang1,2,*   Β·   Yiyuan Zhang1,2,*,‑   Β·   Xiaoyu Yue2,3   Β·   Xiangyu Yue1   Β·   Yangguang Li1,†   Β·   Wanli Ouyang1,2   Β·   Lei Bai2,†

1 MMLab CUHK   2Shanghai AI Lab   3USYD
*Equal Contribution   ‑Project Lead   †Corresponding Authors  

[arXiv]  [Model]  [Dataset] 


Highlights: We propose Transition Models (TiM), a novel generative model that learns to navigate the entire generative trajectory with unprecedented flexibility.

  • Our Transition Models (TiM) are trained to master arbitrary state-to-state transitions. This approach allows TiM to learn the entire solution manifold of the generative process, unifying the few-step and many-step regimes within a single, powerful model. Figure
  • Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts on GenEval benchmark. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases. Figure
  • Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to $4096\times4096$. Figure

🚨 News

  • 2025-9-5 We are delighted to introduce TiM, which is the first text-to-image generator support any-step generation, entirely trained from scratch. We have released the codes and pretrained models of TiM.

1. Setup

First, clone the repo:

git clone https://github.com/WZDTHU/TiM.git && cd TiM

1.1 Environment Setup

conda create -n tim_env python=3.10
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn
pip install -r requirements.txt
pip install -e .

1.2 Model Zoo (WIP)

Text-to-Image Generation

A single TiM model can perform any-step generation (one-step, few-step, and multi-step) and demonstrate monotonic quality improvement as the sampling budget increases.

Model Model Zoo Model Size VAE 1-NFE GenEval 8-NFE GenEval 128-NFE GenEval
TiM-T2I πŸ€— HF 865M DC-AE 0.67 0.76 0.83
mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-T2I/resolve/main/t2i_model.bin" -O checkpoints/t2i_model.bin

Class-guided Image Generation:

Model Model Zoo Model Size VAE 2-NFE FID 500-NFE FID
TiM-C2I-256 πŸ€— HF 664M SD-VAE 6.14 1.65
TiM-C2I-512 πŸ€— HF 664M DC-AE 4.79 1.69
mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_256.safetensors" -O checkpoints/c2i_model_256.safetensors
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_512.safetensors" -O checkpoints/c2i_model_512.safetensors

2. Sampling

Text-to-Image Generation

We provide the sampling scripts on three benchmarks: GenEval, DPGBench, and MJHQ30K. You can specify the sampling steps, resolutions, and CFG scale in the corresponding scripts.

Sampling with TiM-T2I model on GenEval benchmark:

bash scripts/sample/t2i/sample_t2i_geneval.sh

Sampling with TiM-T2I model on DPGBench benchmark:

bash scripts/sample/t2i/sample_t2i_dpgbench.sh

Sampling with TiM-T2I model on MJHQ30k benchmark:

bash scripts/sample/t2i/sample_t2i_mjhq30k.sh

Class-guided Image Generation

We provide the sampling scripts for ImageNet-256 and ImageNet-512.

Sampling with C2I model on $256\times256$ resolution:

bash scripts/sample/c2i/sample_256x256.sh

Sampling with C2I model on $512\times512$ resolution:

bash scripts/sample/c2i/sample_512x512.sh

3. Evaluation

Text-to-Image Generation

GenEval

Please follow the GenEval to setup the conda-environment.

Given the directory of the generated images SAMPLING_DIR and folder of object dector OBJECT_DETECTOR_FOLDER, run the following codes:

python projects/evaluate/geneval/evaluation/evaluate_images.py $SAMPLING_DIR --outfile geneval_results.jsonl --model-path $OBJECT_DETECTOR_FOLDER

This will result in a JSONL file with each line corresponding to an image. Run the following codes to obtain the GenEval Score:

python projects/evaluate/geneval/evaluation/summary_scores.py geneval_results.jsonl

DPGBench

Please follow the DPGBench to setup the conda-environment. Given the directory of the generated images SAMPLING_DIR , run the following codes:

python projects/evaluate/dpg_bench/compute_dpg_bench.py --image-root-path $SAMPLING_DIR --res-path dpgbench_results.txt --pic-num 4 

MJHQ30K

Please download MJHQ30K as the reference-image.

Given the directory of the reference-image direcotry REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes to calculate the FID Score:

python projects/evaluate/mjhq30k/calculate_fid.py $REFERENCE_DIR $SAMPLING_DIR

For CLIP Score, first compute the text features and save it in MJHQ30K_TEXT_FEAT:

python projects/evaluate/mjhq30k/calculate_clip.py projects/evaluate/mjhq30k/meta_data.json $MJHQ30K_TEXT_FEAT/clip_feat.safetensors --save-stats

Then run the following codes to calculate the CLIP Score:

python projects/evaluate/mjhq30k/calculate_clip.py $MJHQ30K_TEXT_FEAT/clip_feat.safetensors $SAMPLING_DIR

Class-guided Image Generation

The sampling generates a folder of samples to compute FID, Inception Score and other metrics. Note that we do not pack the generate samples as a .npz file, this does not affect the calculation of FID and other metrics. Please follow the ADM's TensorFlow evaluation suite to setup the conda-environment and download the reference batch.

wget -c "https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/classify_image_graph_def.pb" -O checkpoints/classify_image_graph_def.pb

Given the directory of the reference batch REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes:

python projects/evaluate/adm_evaluator.py $REFERENCE_DIR $SAMPLING_DIR

4. Training

4.1 Dataset Setup

Currently, we provide all the preprocessed dataset for ImageNet1K. Please use the following commands to download the preprocessed latents.

bash tools/download_imagenet_256x256.sh
bash tools/download_imagenet_512x512.sh

For text-to-image generation, we provide a toy dataset. Please use the following command to download this dataset.

bash tools/download_toy_t2i_dataset.sh

4.2 Download Image Encoder

We use RADIO-v2.5-b as our image encoder for REPA-loss.

wget -c "https://huggingface.co/nvidia/RADIO/resolve/main/radio-v2.5-b_half.pth.tar" -O checkpoints/radio-v2.5-b_half.pth.tar

4.3 Training Scripts

Specify the image_dir in configs/c2i/tim_b_p4.yaml and train the base-model (131M) on ImageNet-256:

bash scripts/train/c2i/train_tim_c2i_b.sh

Specify the image_dir in configs/c2i/tim_xl_p2_256.yaml and train the XL-model (664M) on ImageNet-256:

bash scripts/train/c2i/train_tim_c2i_xl_256.sh

Specify the image_dir in configs/c2i/tim_xl_p2_512.yaml and train the XL-model (664M) on ImageNet-512:

bash scripts/train/c2i/train_tim_c2i_xl_512.sh

Specify the root_dir in configs/t2i/tim_xl_p1_t2i.yaml and train the T2I-model (865M) on Toy-T2I-Dataset:

bash scripts/train/t2i/train_tim_t2i.sh

Citations

If you find the project useful, please kindly cite:

@article{wang2025transition,
  title={Transition Models: Rethinking the Generative Learning Objective}, 
  author={Wang, Zidong and Zhang, Yiyuan and Yue, Xiaoyu and Yue, Xiangyu and Li, Yangguang and Ouyang, Wanli and Bai, Lei},
  year={2025},
  eprint={2509.04394},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

https://arxiv.org/abs/

License

This project is licensed under the Apache-2.0 license.