Qwen2-VL-2b-VPT-Det / README.md
rp-yu's picture
Update README.md
df651c2 verified
metadata
base_model:
  - Qwen/Qwen2-VL-2B-Instruct
datasets:
  - rp-yu/VPT_Datasets
language:
  - en
library_name: transformers
license: apache-2.0
metrics:
  - accuracy
pipeline_tag: image-text-to-text

Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken