Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 6 items • Updated 1 day ago • 30
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 17 days ago • 171
FastRTC Custom UIs Collection A collection of FastRTC demos that showcase how to built a Custom UI for your server • 4 items • Updated 17 days ago • 2
EchoLLaMA: 3D-to-Speech with Multimodal AI Collection This collection contains the models and datasets used in EchoLLaMA: 3D-to-Speech with Multimodal AI paper. • 4 items • Updated 17 days ago • 4
Llama 4 Collection Meta's new Llama 4 multimodal models, Scout & Maverick. Includes Dynamic GGUFs, 16-bit & Dynamic 4-bit uploads. Run & fine-tune them with Unsloth! • 15 items • Updated about 10 hours ago • 43
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 23
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 6 days ago • 161
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos Paper • 2412.09401 • Published Dec 12, 2024 • 3
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 13 items • Updated 1 day ago • 31
EuroBERT Collection Scaling Multilingual Encoders for European Languages • 4 items • Updated Mar 10 • 11