arxiv:2410.19702
Yansong Shi
nanamma
AI & ML interests
multi modality, video understanding, robotics
Recent Activity
upvoted
a
paper
about 1 month ago
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
authored
a paper
3 months ago
InternVideo2: Scaling Video Foundation Models for Multimodal Video
Understanding
authored
a paper
3 months ago
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded
Tuning