# Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

![teaser](asset/images/teaser.png) ## Environment ```bash conda create -n Any2AnyTryon python=3.11 conda activate Any2AnyTryon pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 pip install --upgrade -r requirements.txt ``` ## Demo Using `--group_offloading` saves memory but slows processing. ```bash python app.py ``` ## Inference Choose a LoRA from [here](https://huggingface.co/loooooong/Any2anyTryon) or use `--lora_name` with the script. ```bash python infer.py --model_image ./asset/images/model/model1.png --garment_image ./asset/images/garment/garment1.jpg \ --prompt=" a man t-shirt with pockets man wearing the t-shirt" ``` ## Test Download the dataset from the [VITON-HD](https://github.com/shadow2496/VITON-HD). Generate segmentation masks using [AutoMasker](https://github.com/Zheng-Chong/CatVTON/blob/edited/preprocess_agnostic_mask.py) for repainting. For unpaired data evaluation, execute the following command: ```bash CUDA_VISIBLE_DEVICES=0 python test_vitonhd.py --model_path black-forest-labs/FLUX.1-dev \ --model_dir data/zalando-hd-resized/test/image --garment_dir data/zalando-hd-resized/test/cloth \ --output_dir ./results/vitonhd_test_unpaired_repaint --meta_file data/zalando-hd-resized/test_pairs.txt \ --mask_dir data/zalando-hd-resized/test/mask --source_dir data/zalando-hd-resized/test/image \ --train_double_block_only --repaint ``` For paired test, download [test set](https://huggingface.co/datasets/loooooong/Any2anyTryon_vitonhd_test) and extract images to local directory. ```bash python src/download_data.py ``` ```bash CUDA_VISIBLE_DEVICES=0 python test_vitonhd.py --model_path black-forest-labs/FLUX.1-dev \ --model_dir data/zalando-hd-resized/test/image_synthesis --garment_dir data/zalando-hd-resized/test/cloth \ --output_dir ./results/vitonhd_test_paired_repaint --meta_file data/zalando-hd-resized/test_pairs.txt \ --mask_dir data/zalando-hd-resized/test/mask --source_dir data/zalando-hd-resized/test/image \ --train_double_block_only --repaint --paired ``` ## Datasets Some data can be downloaded from [here](https://huggingface.co/datasets/loooooong/LAION-Garment). Use the following code to ensure pixel correspondence between the original image and the synthesized image: ```python from src.utils import crop_to_multiple_of_16 img_src = Image.open() img_inpaint = Image.open(io.BytesIO()) img_src = crop_to_multiple_of_16(img_src) assert img_src.size==img_inpaint.size ``` ## To-Do List - \[x\] Demo code and gradio interface - \[x\] Inference code - \[x\] Tryon checkpoint - \[x\] Model generation checkpoint - \[x\] Garment reconstruction checkpoint - \[x\] Base all tasks checkpoint - \[ \] Dataset preparation - \[ \] Training code ## Citation ```bibtex @misc{guo2025any2anytryonleveragingadaptiveposition, title={Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks}, author={Hailong Guo and Bohan Zeng and Yiren Song and Wentao Zhang and Chuang Zhang and Jiaming Liu}, year={2025}, eprint={2501.15891}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2501.15891}, } ```