update
Browse files
README.md
CHANGED
@@ -194,7 +194,7 @@ Our training data consists of:
|
|
194 |
|
195 |
* Generic Video SFT Data: [ShareGPT4Video](https://sharegpt4video.github.io/) and [LLaVA-Video](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K).
|
196 |
|
197 |
-
* Instructional Video Data: [Ego4d](https://ego4d-data.org/), [Somethingv2](https://www.qualcomm.com/developer/software/something-something-v-2-dataset), [Epic-Kitchen](https://epic-kitchens.github.io/2025)
|
198 |
|
199 |
* Robotics Manipulation Data: [Open-X-Embodiment](https://robotics-transformer-x.github.io/).
|
200 |
|
@@ -276,7 +276,7 @@ We evaluate the model's performance after finetuning on the following datasets:
|
|
276 |
|
277 |
* Multimodal Image Understanding and Reasoning: [VQAv2](https://visualqa.org/), [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [POPE](https://huggingface.co/datasets/lmms-lab/POPE), [TextVQA](https://textvqa.org/), [ChartQA](https://github.com/vis-nlp/ChartQA), [DocVQA](https://www.docvqa.org/).
|
278 |
|
279 |
-
* Multimodal Video Understanding and Reasoning: [Next-QA](https://github.com/doc-doc/NExT-QA), [VideoMME](https://video-mme.github.io/home_page.html), [MVBench](https://
|
280 |
|
281 |
#### Metrics
|
282 |
<!-- {{ testing_metrics | default("[More Information Needed]", true)}} -->
|
|
|
194 |
|
195 |
* Generic Video SFT Data: [ShareGPT4Video](https://sharegpt4video.github.io/) and [LLaVA-Video](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K).
|
196 |
|
197 |
+
* Instructional Video Data: [Ego4d](https://ego4d-data.org/), [Somethingv2](https://www.qualcomm.com/developer/software/something-something-v-2-dataset), [Epic-Kitchen](https://epic-kitchens.github.io/2025) and other related instructional videos.
|
198 |
|
199 |
* Robotics Manipulation Data: [Open-X-Embodiment](https://robotics-transformer-x.github.io/).
|
200 |
|
|
|
276 |
|
277 |
* Multimodal Image Understanding and Reasoning: [VQAv2](https://visualqa.org/), [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [POPE](https://huggingface.co/datasets/lmms-lab/POPE), [TextVQA](https://textvqa.org/), [ChartQA](https://github.com/vis-nlp/ChartQA), [DocVQA](https://www.docvqa.org/).
|
278 |
|
279 |
+
* Multimodal Video Understanding and Reasoning: [Next-QA](https://github.com/doc-doc/NExT-QA), [VideoMME](https://video-mme.github.io/home_page.html), [MVBench](https://huggingface.co/datasets/OpenGVLab/MVBench).
|
280 |
|
281 |
#### Metrics
|
282 |
<!-- {{ testing_metrics | default("[More Information Needed]", true)}} -->
|