Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar PRO
orrzohar
AI & ML interests
Large Multi-Modal Models, Foundation Models, Video Understanding
Recent Activity
upvoted
a
paper
38 minutes ago
Large Motion Video Autoencoding with Cross-modal Video VAE
upvoted
a
paper
38 minutes ago
Deliberation in Latent Space via Differentiable Cache Augmentation
upvoted
a
paper
about 21 hours ago
OpenAI o1 System Card
Organizations
Collections
2
interesting Video-LLMs
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 29 -
VILA: On Pre-training for Visual Language Models
Paper • 2312.07533 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
Long Context Transfer from Language to Vision
Paper • 2406.16852 • Published • 32