vision language models - a adamelliotfields Collection

adamelliotfields 's Collections

small language models

vision language models

video generation

image generation

papers

vision language models

updated Apr 9

papers and models 🙈

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 122
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78
mistralai/Pixtral-12B-2409

Updated 24 minutes ago • 6.01k • 656
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • 2B • Updated Apr 8 • 96.5k • 524
showlab/ShowUI-2B

Updated Mar 11 • 6.34k • 263
microsoft/Phi-3-vision-128k-instruct

Text Generation • 4B • Updated Aug 20, 2024 • 21.5k • 963
mtgv/MobileVLM_V2-1.7B

Text Generation • Updated Feb 7, 2024 • 8.39k • 28
mtgv/MobileVLM_V2-3B

Text Generation • Updated Feb 7, 2024 • 330 • 7
xtuner/llava-phi-3-mini

Image-Text-to-Text • 4B • Updated Apr 25, 2024 • 76 • 25
rhymes-ai/Aria

Image-Text-to-Text • 25B • Updated Apr 23 • 21.1k • 633
zai-org/glm-edge-v-2b

Image-Text-to-Text • 2B • Updated Jan 2 • 3.26k • 11
zai-org/glm-edge-v-5b

Image-Text-to-Text • 5B • Updated Jan 2 • 607 • 12
h2oai/h2ovl-mississippi-2b

Text Generation • 2B • Updated Dec 13, 2024 • 117k • 37
google/paligemma2-3b-pt-448

Image-Text-to-Text • 3B • Updated Dec 5, 2024 • 8.51k • 46