kimyoungjune commited on
Commit
afcbec0
·
verified ·
1 Parent(s): a311c4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -17,9 +17,14 @@ pipeline_tag: image-text-to-text
17
 
18
  # VARCO-VISION-14B
19
 
20
- ## About the Model
 
 
 
21
 
22
- **VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
 
 
23
 
24
  - **Developed by:** NC Research, Multimodal Generation Team
25
  - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
@@ -146,7 +151,7 @@ conversation = [
146
  {
147
  "role": "user",
148
  "content": [
149
- {"type": "text", "text": "<gro>\nDescribe the image in detail."},
150
  {"type": "image"},
151
  ],
152
  },
@@ -171,7 +176,7 @@ conversation = [
171
  "content": [
172
  {
173
  "type": "text",
174
- "text": "<obj>이 물건</obj><bbox>0.039, 0.138, 0.283, 0.257</bbox>은 어떻게 쓰는거야?",
175
  },
176
  {"type": "image"},
177
  ],
@@ -196,7 +201,7 @@ conversation = [
196
  {
197
  "role": "user",
198
  "content": [
199
- {"type": "text", "text": "<ocr>"},
200
  {"type": "image"},
201
  ],
202
  },
 
17
 
18
  # VARCO-VISION-14B
19
 
20
+ ## 🚨News🎙️
21
+ - The 2.0 model has been released. Please use the new version.
22
+ - 📰 2025-07-16: We released VARCO-VISION-2.0-14B at [link](https://huggingface.co/NCSOFT/VARCO-VISION-2.0-14B)
23
+ - 📰 2025-07-16: We released GME-VARCO-VISION-Embedding at [link](https://huggingface.co/NCSOFT/GME-VARCO-VISION-Embedding)
24
 
25
+ ## About the VARCO-VISION-1.0-14B Model
26
+
27
+ **VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The Model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
28
 
29
  - **Developed by:** NC Research, Multimodal Generation Team
30
  - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
 
151
  {
152
  "role": "user",
153
  "content": [
154
+ {"type": "text", "text": "\nDescribe the image in detail."},
155
  {"type": "image"},
156
  ],
157
  },
 
176
  "content": [
177
  {
178
  "type": "text",
179
+ "text": " 물건0.039, 0.138, 0.283, 0.257 어떻게 쓰는거야?",
180
  },
181
  {"type": "image"},
182
  ],
 
201
  {
202
  "role": "user",
203
  "content": [
204
+ {"type": "text", "text": ""},
205
  {"type": "image"},
206
  ],
207
  },