UNI input image size during training
Dear Mahmood Lab Team,
Thank you for making UNI available.
I would like to understand at which input image size UNI was trained at.
From the UNI publication it appears to be 256 X 256 (for majority of iterations) and 512 X 512 (for high resolution fine-tuning at the end), however in the Huggingface docs an input image size of 224 X 224 pixels is suggested. Why is this the case?
Of course, ideally the ViT model should be able to generalize to varying sequence sizes, but replicating the train input image size seems like the most straight forward approach.
Looking forward to your reply.
Best,
Lydia
Hi all,
We also have encountered same confusion. Has there been a suggestion over this? Does it mean there is a 224x224 resize been applied over the 256x256 and 512x512 images?
Looking forward for any rely!
Thank you and Best,
Alex