kimyoungjune commited on
Commit
18f671b
·
verified ·
1 Parent(s): 47b1d3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -17,9 +17,14 @@ pipeline_tag: image-text-to-text
17
 
18
  # VARCO-VISION-14B-HF
19
 
20
- ## About the Model
 
 
 
21
 
22
- **VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
 
 
23
 
24
  - **Developed by:** NC Research, Multimodal Generation Team
25
  - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
@@ -116,7 +121,7 @@ conversation = [
116
  {
117
  "role": "user",
118
  "content": [
119
- {"type": "text", "text": "<gro>\nDescribe the image in detail."},
120
  {"type": "image"},
121
  ],
122
  },
@@ -141,7 +146,7 @@ conversation = [
141
  "content": [
142
  {
143
  "type": "text",
144
- "text": "<obj>이 물건</obj><bbox>0.039, 0.138, 0.283, 0.257</bbox>은 어떻게 쓰는거야?",
145
  },
146
  {"type": "image"},
147
  ],
@@ -166,7 +171,7 @@ conversation = [
166
  {
167
  "role": "user",
168
  "content": [
169
- {"type": "text", "text": "<ocr>"},
170
  {"type": "image"},
171
  ],
172
  },
 
17
 
18
  # VARCO-VISION-14B-HF
19
 
20
+ ## 🚨News🎙️
21
+ - The 2.0 model has been released. Please use the new version.
22
+ - 📰 2025-07-16: We released VARCO-VISION-2.0-14B at [link](https://huggingface.co/NCSOFT/VARCO-VISION-2.0-14B)
23
+ - 📰 2025-07-16: We released GME-VARCO-VISION-Embedding at [link](https://huggingface.co/NCSOFT/GME-VARCO-VISION-Embedding)
24
 
25
+ ## About the VARCO-VISION-1.0-14B Model
26
+
27
+ **VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The Model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
28
 
29
  - **Developed by:** NC Research, Multimodal Generation Team
30
  - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
 
121
  {
122
  "role": "user",
123
  "content": [
124
+ {"type": "text", "text": "\nDescribe the image in detail."},
125
  {"type": "image"},
126
  ],
127
  },
 
146
  "content": [
147
  {
148
  "type": "text",
149
+ "text": " 물건0.039, 0.138, 0.283, 0.257 어떻게 쓰는거야?",
150
  },
151
  {"type": "image"},
152
  ],
 
171
  {
172
  "role": "user",
173
  "content": [
174
+ {"type": "text", "text": ""},
175
  {"type": "image"},
176
  ],
177
  },