rp-yu commited on
Commit
df651c2
·
verified ·
1 Parent(s): 81b3986

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -1,12 +1,19 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - rp-yu/VPT_Datasets
5
  language:
6
  - en
 
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - Qwen/Qwen2-VL-2B-Instruct
11
- library_name: transformers
12
- ---
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-2B-Instruct
4
  datasets:
5
  - rp-yu/VPT_Datasets
6
  language:
7
  - en
8
+ library_name: transformers
9
+ license: apache-2.0
10
  metrics:
11
  - accuracy
12
+ pipeline_tag: image-text-to-text
13
+ ---
14
+
15
+ # Introducing Visual Perception Token into Multimodal Large Language Model
16
+
17
+ This repository contains models based on the paper [Introducing Visual Perception Token into Multimodal Large Language Model](https://arxiv.org/abs/2502.17425). These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).
18
+
19
+ Code: https://github.com/yu-rp/VisualPerceptionToken