Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,14 @@ pipeline_tag: image-text-to-text
|
|
17 |
|
18 |
# VARCO-VISION-14B-HF
|
19 |
|
20 |
-
##
|
|
|
|
|
|
|
21 |
|
22 |
-
|
|
|
|
|
23 |
|
24 |
- **Developed by:** NC Research, Multimodal Generation Team
|
25 |
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
|
@@ -116,7 +121,7 @@ conversation = [
|
|
116 |
{
|
117 |
"role": "user",
|
118 |
"content": [
|
119 |
-
{"type": "text", "text": "
|
120 |
{"type": "image"},
|
121 |
],
|
122 |
},
|
@@ -141,7 +146,7 @@ conversation = [
|
|
141 |
"content": [
|
142 |
{
|
143 |
"type": "text",
|
144 |
-
"text": "
|
145 |
},
|
146 |
{"type": "image"},
|
147 |
],
|
@@ -166,7 +171,7 @@ conversation = [
|
|
166 |
{
|
167 |
"role": "user",
|
168 |
"content": [
|
169 |
-
{"type": "text", "text": "
|
170 |
{"type": "image"},
|
171 |
],
|
172 |
},
|
|
|
17 |
|
18 |
# VARCO-VISION-14B-HF
|
19 |
|
20 |
+
## 🚨News🎙️
|
21 |
+
- The 2.0 model has been released. Please use the new version.
|
22 |
+
- 📰 2025-07-16: We released VARCO-VISION-2.0-14B at [link](https://huggingface.co/NCSOFT/VARCO-VISION-2.0-14B)
|
23 |
+
- 📰 2025-07-16: We released GME-VARCO-VISION-Embedding at [link](https://huggingface.co/NCSOFT/GME-VARCO-VISION-Embedding)
|
24 |
|
25 |
+
## About the VARCO-VISION-1.0-14B Model
|
26 |
+
|
27 |
+
**VARCO-VISION-14B** is a powerful English-Korean Vision-Language Model (VLM). The training pipeline of VARCO-VISION consists of four stages: Feature Alignment Pre-training, Basic Supervised Fine-tuning, Advanced Supervised Fine-tuning, and Preference Optimization. In both multimodal and text-only benchmarks, VARCO-VISION-14B not only surpasses other models of similar size in performance but also achieves scores comparable to those of proprietary models. The Model currently accepts a single image and a text as inputs, generating an output text. It supports grounding, referring as well as OCR (Optical Character Recognition).
|
28 |
|
29 |
- **Developed by:** NC Research, Multimodal Generation Team
|
30 |
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
|
|
|
121 |
{
|
122 |
"role": "user",
|
123 |
"content": [
|
124 |
+
{"type": "text", "text": "\nDescribe the image in detail."},
|
125 |
{"type": "image"},
|
126 |
],
|
127 |
},
|
|
|
146 |
"content": [
|
147 |
{
|
148 |
"type": "text",
|
149 |
+
"text": "이 물건0.039, 0.138, 0.283, 0.257은 어떻게 쓰는거야?",
|
150 |
},
|
151 |
{"type": "image"},
|
152 |
],
|
|
|
171 |
{
|
172 |
"role": "user",
|
173 |
"content": [
|
174 |
+
{"type": "text", "text": ""},
|
175 |
{"type": "image"},
|
176 |
],
|
177 |
},
|