Update README.md
Browse files
README.md
CHANGED
@@ -22,16 +22,26 @@ Chain-of-thought (CoT) reasoning greatly improves the interpretability and probl
|
|
22 |
|
23 |
UV-CoT achieves this by performing preference comparisons between model-generated bounding boxes. It generates preference data automatically, then uses an evaluator MLLM (e.g., OmniLLM-12B) to rank responses, which serves as supervision to train the target MLLM (e.g., LLaVA-1.5-7B). This approach emulates human perception—identifying key regions and reasoning based on them—thereby improving visual comprehension, particularly in spatial reasoning tasks.
|
24 |
|
25 |
-

|
26 |
|
27 |
-
<
|
|
|
|
|
28 |
|
29 |
## Visualizations
|
30 |
|
31 |
Qualitative examples demonstrating UV-CoT's visual reasoning:
|
32 |
|
33 |
-
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## Installation
|
37 |
|
|
|
22 |
|
23 |
UV-CoT achieves this by performing preference comparisons between model-generated bounding boxes. It generates preference data automatically, then uses an evaluator MLLM (e.g., OmniLLM-12B) to rank responses, which serves as supervision to train the target MLLM (e.g., LLaVA-1.5-7B). This approach emulates human perception—identifying key regions and reasoning based on them—thereby improving visual comprehension, particularly in spatial reasoning tasks.
|
24 |
|
25 |
+
<!--  -->
|
26 |
|
27 |
+
<div align="center">
|
28 |
+
<img src="./images/fig1.svg" alt="Figure 1: UV-CoT Overview" width="1200px" />
|
29 |
+
</div>
|
30 |
|
31 |
## Visualizations
|
32 |
|
33 |
Qualitative examples demonstrating UV-CoT's visual reasoning:
|
34 |
|
35 |
+
<div align="center">
|
36 |
+
<img src="./images/fig5_v1.2.svg" alt="Figure 5: UV-CoT Visualization 1" width="1200px" />
|
37 |
+
</div>
|
38 |
+
|
39 |
+
<div align="center">
|
40 |
+
<img src="./images/fig6_v1.2.svg" alt="Figure 6: UV-CoT Visualization 2" width="1200px" />
|
41 |
+
</div>
|
42 |
+
|
43 |
+
<!-- 
|
44 |
+
 -->
|
45 |
|
46 |
## Installation
|
47 |
|