ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers Paper • 2305.15272 • Published May 24, 2023
TouchStone: Evaluating Vision-Language Models by Language Models Paper • 2308.16890 • Published Aug 31, 2023 • 1
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection Paper • 2204.02964 • Published Apr 6, 2022
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 7 days ago • 22
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 7 days ago • 22
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 7 days ago • 22 • 2