internlm/internlm-xcomposer2d5-7b Visual Question Answering • Updated Jul 22, 2024 • 69.7k • 204
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 83