LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 โข 18
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release โข 12 items โข Updated 5 days ago โข 59
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper โข 2412.05271 โข Published Dec 6, 2024 โข 129
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 โข 29
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper โข 2412.10302 โข Published Dec 13, 2024 โข 12
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper โข 2412.10360 โข Published Dec 13, 2024 โข 139
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 โข 69
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy Sep 18, 2024 โข 216
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 โข 57