yeliudev commited on
Commit
feaea23
·
verified ·
1 Parent(s): b14fe33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: bsd-3-clause
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ pipeline_tag: video-text-to-text
4
+ ---
5
+
6
+ # VideoMind-2B-FT-QVHighlights
7
+
8
+ <div style="display: flex; gap: 5px;">
9
+ <a href="https://arxiv.org/abs/2503.13444" target="_blank"><img src="https://img.shields.io/badge/arXiv-2503.13444-red"></a>
10
+ <a href="https://videomind.github.io/" target="_blank"><img src="https://img.shields.io/badge/Project-Page-brightgreen"></a>
11
+ <a href="https://github.com/yeliudev/VideoMind/blob/main/README.md" target="_blank"><img src="https://img.shields.io/badge/License-BSD--3--Clause-purple"></a>
12
+ <a href="https://github.com/yeliudev/VideoMind" target="_blank"><img src="https://img.shields.io/github/stars/yeliudev/VideoMind"></a>
13
+ </div>
14
+
15
+ VideoMind is a multi-modal agent framework that enhances video reasoning by emulating *human-like* processes, such as *breaking down tasks*, *localizing and verifying moments*, and *synthesizing answers*.
16
+
17
+ ## 🔖 Model Details
18
+
19
+ ### Model Description
20
+
21
+ - **Model type:** Multi-modal Large Language Model
22
+ - **Language(s):** English
23
+ - **License:** BSD-3-Clause
24
+
25
+ ### More Details
26
+
27
+ Please refer to our [GitHub Repository](https://github.com/yeliudev/VideoMind) for more details about this model.
28
+
29
+ ## 📖 Citation
30
+
31
+ Please kindly cite our paper if you find this project helpful.
32
+
33
+ ```
34
+ @article{liu2025videomind,
35
+ title={VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning},
36
+ author={Liu, Ye and Lin, Kevin Qinghong and Chen, Chang Wen and Shou, Mike Zheng},
37
+ journal={arXiv preprint arXiv:2503.13444},
38
+ year={2025}
39
+ }
40
+ ```