One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Zechen Bai 1  Tong He 2  Haiyang Mei 1  Pichao Wang 2  Ziteng Gao 1  Joya Chen 1  Lei Liu 2  Zheng Zhang 2  Mike Zheng Shou 1 

NeurIPS 2024

1 Show Lab, National University of Singapore   2 Amazon 

arXiv

Please find the code at: https://github.com/showlab/VideoLISA

Downloads last month
300
Safetensors
Model size
4.48B params
Tensor type
F32
·
BF16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for ZechenBai/VideoLISA-3.8B

Finetuned
(1)
this model