Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,14 @@ pipeline_tag: image-text-to-text
|
|
17 |
|
18 |
# VARCO-VISION-14B
|
19 |
|
20 |
-
##
|
|
|
|
|
|
|
21 |
|
22 |
-
|
|
|
|
|
23 |
|
24 |
- **Developed by:** NC Research, Multimodal Generation Team
|
25 |
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
|
@@ -146,7 +151,7 @@ conversation = [
|
|
146 |
{
|
147 |
"role": "user",
|
148 |
"content": [
|
149 |
-
{"type": "text", "text": "
|
150 |
{"type": "image"},
|
151 |
],
|
152 |
},
|
@@ -171,7 +176,7 @@ conversation = [
|
|
171 |
"content": [
|
172 |
{
|
173 |
"type": "text",
|
174 |
-
"text": "
|
175 |
},
|
176 |
{"type": "image"},
|
177 |
],
|
@@ -196,7 +201,7 @@ conversation = [
|
|
196 |
{
|
197 |
"role": "user",
|
198 |
"content": [
|
199 |
-
{"type": "text", "text": "
|
200 |
{"type": "image"},
|
201 |
],
|
202 |
},
|
|
|
17 |
|
18 |
# VARCO-VISION-14B
|
19 |
|
20 |
+
## 🚨News🎙️
|
21 |
+
- The 2.0 model has been released. Please use the new version.
|
22 |
+
- 📰 2025-07-16: We released VARCO-VISION-2.0-14B at [link](https://huggingface.co/NCSOFT/VARCO-VISION-2.0-14B)
|
23 |
+
- 📰 2025-07-16: We released GME-VARCO-VISION-Embedding at [link](https://huggingface.co/NCSOFT/GME-VARCO-VISION-Embedding)
|
24 |
|
25 |
+
## About the VARCO-VISION-1.0-14B Model
|
26 |
+
|
27 |
+
**VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The Model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
|
28 |
|
29 |
- **Developed by:** NC Research, Multimodal Generation Team
|
30 |
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
|
|
|
151 |
{
|
152 |
"role": "user",
|
153 |
"content": [
|
154 |
+
{"type": "text", "text": "\nDescribe the image in detail."},
|
155 |
{"type": "image"},
|
156 |
],
|
157 |
},
|
|
|
176 |
"content": [
|
177 |
{
|
178 |
"type": "text",
|
179 |
+
"text": "이 물건0.039, 0.138, 0.283, 0.257은 어떻게 쓰는거야?",
|
180 |
},
|
181 |
{"type": "image"},
|
182 |
],
|
|
|
201 |
{
|
202 |
"role": "user",
|
203 |
"content": [
|
204 |
+
{"type": "text", "text": ""},
|
205 |
{"type": "image"},
|
206 |
],
|
207 |
},
|