Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar PRO
orrzohar
AI & ML interests
Large Multi-Modal Models, Foundation Models, Video Understanding
Organizations
Collections
2
interesting Video-LLMs
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 29 -
VILA: On Pre-training for Visual Language Models
Paper • 2312.07533 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
Long Context Transfer from Language to Vision
Paper • 2406.16852 • Published • 32