VPT Models
Collection
Qwen2-VL Models with Visual Perception Token or used in training process.
•
7 items
•
Updated
This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).