[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
Ye Liu
yeliudev
AI & ML interests
Vision & Language
Recent Activity
upvoted a paper 13 days ago
Code2World: A GUI World Model via Renderable Code Generation updated
a model 29 days ago
yeliudev/VideoMind-2B-FT-QVHighlights updated
a dataset 29 days ago
yeliudev/VideoMind-Dataset Organizations
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
- Running on Zero6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 239 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 90 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 1.06k • 2
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
VideoMind
[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
- Running on Zero6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 239 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 90 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 1.06k • 2
E.T. Bench
[NeurIPS 2024] E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding