AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity Paper • 2410.02745 • Published Sep 20, 2024
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation Paper • 2409.19937 • Published Sep 30, 2024
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4 • 15