lbourdois commited on
Commit
2c6270c
·
verified ·
1 Parent(s): d8d663b

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +45 -34
README.md CHANGED
@@ -1,35 +1,46 @@
1
- ---
2
- license: apache-2.0
3
- base_model:
4
- - Qwen/Qwen2.5-7B-Instruct
5
- pipeline_tag: text-generation
6
- language:
7
- - en
8
- - zh
9
- ---
10
- # Insight-V-Reason
11
-
12
- ## Model Summary
13
-
14
- The Insight-V models are 7B parameter models based on Qwen2.5 language model with a context window of 32K tokens.
15
-
16
- Insight-V offers **1)** a scalable data generation pipeline for long-chain, high-quality reasoning data, **2)** a multi-agent system that decomposes visual reasoning tasks into reasoning and summarization, and **3)** a two-stage training pipeline to enhance visual reasoning capabilities. Together, these contributions address key challenges in visual reasoning, providing a solid foundation for future research in MLLM reasoning.
17
-
18
- - **Repository:** https://github.com/dongyh20/Insight-V
19
- - **Languages:** English, Chinese
20
- - **Paper:** https://arxiv.org/abs/2411.14432
21
-
22
-
23
- ### Model Architecture
24
-
25
- - **Architecture:** Pre-trained [Oryx-ViT](https://huggingface.co/THUdyh/Oryx-ViT) + Qwen2.5-7B
26
- - **Data:** a mixture of 200k reasoning data
27
- - **Precision:** BFloat16
28
-
29
- #### Hardware & Software
30
-
31
- - **Hardware:** 64 * NVIDIA Tesla A100
32
- - **Orchestration:** HuggingFace Trainer
33
- - **Code:** Pytorch
34
-
 
 
 
 
 
 
 
 
 
 
 
35
  ## Citation
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-7B-Instruct
5
+ pipeline_tag: text-generation
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ ---
21
+ # Insight-V-Reason
22
+
23
+ ## Model Summary
24
+
25
+ The Insight-V models are 7B parameter models based on Qwen2.5 language model with a context window of 32K tokens.
26
+
27
+ Insight-V offers **1)** a scalable data generation pipeline for long-chain, high-quality reasoning data, **2)** a multi-agent system that decomposes visual reasoning tasks into reasoning and summarization, and **3)** a two-stage training pipeline to enhance visual reasoning capabilities. Together, these contributions address key challenges in visual reasoning, providing a solid foundation for future research in MLLM reasoning.
28
+
29
+ - **Repository:** https://github.com/dongyh20/Insight-V
30
+ - **Languages:** English, Chinese
31
+ - **Paper:** https://arxiv.org/abs/2411.14432
32
+
33
+
34
+ ### Model Architecture
35
+
36
+ - **Architecture:** Pre-trained [Oryx-ViT](https://huggingface.co/THUdyh/Oryx-ViT) + Qwen2.5-7B
37
+ - **Data:** a mixture of 200k reasoning data
38
+ - **Precision:** BFloat16
39
+
40
+ #### Hardware & Software
41
+
42
+ - **Hardware:** 64 * NVIDIA Tesla A100
43
+ - **Orchestration:** HuggingFace Trainer
44
+ - **Code:** Pytorch
45
+
46
  ## Citation