Orr Zohar's picture

Orr Zohar PRO

orrzohar

·

https://orrzohar.github.io

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

updated a collection 5 days ago

updated a collection 5 days ago

updated a collection 5 days ago

View all activity

Organizations

New activity in google/gemma-3-4b-it 4 months ago

SigLIP or SigLIP2 encoder?

#37 opened 5 months ago by

commented 2 papers 5 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197 •

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197 •

New activity in google/gemma-3-27b-it 5 months ago

SigLIP or SigLIP2 encoder?

#48 opened 5 months ago by

New activity in HuggingFaceTB/SmolVLM2-2.2B-Instruct 6 months ago

Input Video length constraints

#6 opened 6 months ago by

Several questions on the same video

#8 opened 6 months ago by

checkpoint you are trying to load has model type `smolvlm` but Transformers does not recognize this

#7 opened 6 months ago by

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same

#4 opened 6 months ago by

Using pre-computed embeddings for images/frames and using as input

#2 opened 6 months ago by

commented 2 papers 8 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147 •

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147 •

New activity in lmms-lab/LLaVA-OneVision-Data 11 months ago

Missing/corrupted images in dataset

#9 opened 11 months ago by

commented 4 papers about 1 year ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 53 •

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42 •

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42 •

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Paper • 2407.06189 • Published Jul 8, 2024 • 27 •

New activity in HuggingFaceM4/idefics2-8b about 1 year ago

Idefics2-pretraining

#54 opened over 1 year ago by

New activity in meta-llama/Meta-Llama-3-8B-Instruct over 1 year ago

The request to access the repo has been sent for several days, why hasn't it passed yet?

#70 opened over 1 year ago by