jw2yang commited on
Commit
f7db9ca
·
1 Parent(s): 5e21a94
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -194,7 +194,7 @@ Our training data consists of:
194
 
195
  * Generic Video SFT Data: [ShareGPT4Video](https://sharegpt4video.github.io/) and [LLaVA-Video](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K).
196
 
197
- * Instructional Video Data: [Ego4d](https://ego4d-data.org/), [Somethingv2](https://www.qualcomm.com/developer/software/something-something-v-2-dataset), [Epic-Kitchen](https://epic-kitchens.github.io/2025), [COIN](https://coin-dataset.github.io/).
198
 
199
  * Robotics Manipulation Data: [Open-X-Embodiment](https://robotics-transformer-x.github.io/).
200
 
@@ -276,7 +276,7 @@ We evaluate the model's performance after finetuning on the following datasets:
276
 
277
  * Multimodal Image Understanding and Reasoning: [VQAv2](https://visualqa.org/), [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [POPE](https://huggingface.co/datasets/lmms-lab/POPE), [TextVQA](https://textvqa.org/), [ChartQA](https://github.com/vis-nlp/ChartQA), [DocVQA](https://www.docvqa.org/).
278
 
279
- * Multimodal Video Understanding and Reasoning: [Next-QA](https://github.com/doc-doc/NExT-QA), [VideoMME](https://video-mme.github.io/home_page.html), [MVBench](https://arxiv.org/abs/2311.17005).
280
 
281
  #### Metrics
282
  <!-- {{ testing_metrics | default("[More Information Needed]", true)}} -->
 
194
 
195
  * Generic Video SFT Data: [ShareGPT4Video](https://sharegpt4video.github.io/) and [LLaVA-Video](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K).
196
 
197
+ * Instructional Video Data: [Ego4d](https://ego4d-data.org/), [Somethingv2](https://www.qualcomm.com/developer/software/something-something-v-2-dataset), [Epic-Kitchen](https://epic-kitchens.github.io/2025) and other related instructional videos.
198
 
199
  * Robotics Manipulation Data: [Open-X-Embodiment](https://robotics-transformer-x.github.io/).
200
 
 
276
 
277
  * Multimodal Image Understanding and Reasoning: [VQAv2](https://visualqa.org/), [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [POPE](https://huggingface.co/datasets/lmms-lab/POPE), [TextVQA](https://textvqa.org/), [ChartQA](https://github.com/vis-nlp/ChartQA), [DocVQA](https://www.docvqa.org/).
278
 
279
+ * Multimodal Video Understanding and Reasoning: [Next-QA](https://github.com/doc-doc/NExT-QA), [VideoMME](https://video-mme.github.io/home_page.html), [MVBench](https://huggingface.co/datasets/OpenGVLab/MVBench).
280
 
281
  #### Metrics
282
  <!-- {{ testing_metrics | default("[More Information Needed]", true)}} -->