metadata
base_model:
- Qwen/Qwen2-VL-2B-Instruct
datasets:
- rp-yu/VPT_Datasets
language:
- en
library_name: transformers
license: apache-2.0
metrics:
- accuracy
pipeline_tag: image-text-to-text
Introducing Visual Perception Token into Multimodal Large Language Model
This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).