SmolVLM: Redefining small and efficient multimodal models Paper โข 2504.05299 โข Published Apr 7 โข 188
Vision Language Models Quantization Collection Vision Language Models (VLMs) quantized by Neural Magic โข 20 items โข Updated Mar 4 โข 6
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. โข 13 items โข Updated 1 day ago โข 31
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs โข 8 items โข Updated Mar 21 โข 22
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others โข Mar 12 โข 427
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others โข Feb 20 โข 262