kesenZhaoNTU
/

UV-CoT

Image-Text-to-Text

text-generation

chain-of-thought

Model card Files Files and versions

kesenZhaoNTU commited on Jul 17

Commit

5ee53d9

·

verified ·

1 Parent(s): 47bf470

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -22,14 +22,14 @@ Chain-of-thought (CoT) reasoning greatly improves the interpretability and probl
 UV-CoT achieves this by performing preference comparisons between model-generated bounding boxes. It generates preference data automatically, then uses an evaluator MLLM (e.g., OmniLLM-12B) to rank responses, which serves as supervision to train the target MLLM (e.g., LLaVA-1.5-7B). This approach emulates human perception—identifying key regions and reasoning based on them—thereby improving visual comprehension, particularly in spatial reasoning tasks.
-![Figure 1: UV-CoT Overview](https://raw.githubusercontent.com/UV-CoT/UV-CoT/main/images/fig1.png)
 ## Visualizations
 Qualitative examples demonstrating UV-CoT's visual reasoning:
-![Figure 5: UV-CoT Visualization 1](https://raw.githubusercontent.com/UV-CoT/UV-CoT/main/images/fig5.png)
-![Figure 6: UV-CoT Visualization 2](https://raw.githubusercontent.com/UV-CoT/UV-CoT/main/images/fig6.png)
 ## Installation

 UV-CoT achieves this by performing preference comparisons between model-generated bounding boxes. It generates preference data automatically, then uses an evaluator MLLM (e.g., OmniLLM-12B) to rank responses, which serves as supervision to train the target MLLM (e.g., LLaVA-1.5-7B). This approach emulates human perception—identifying key regions and reasoning based on them—thereby improving visual comprehension, particularly in spatial reasoning tasks.
+![Figure 1: UV-CoT Overview](./images/fig1.svg)
 ## Visualizations
 Qualitative examples demonstrating UV-CoT's visual reasoning:
+![Figure 5: UV-CoT Visualization 1](./images/fig5_v1.2.svg)
+![Figure 6: UV-CoT Visualization 2](./images/fig6_v1.2.svg)
 ## Installation