view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages By Quent-01 and 9 others • May 24, 2024 • 27
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 17 days ago • 380
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published 23 days ago • 73
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated 23 days ago • 54
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published Apr 25 • 42
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16, 2024 • 33
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18 • 19
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 15
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering Paper • 2502.10868 • Published Feb 15 • 2
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 20
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 256
view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others • Feb 21 • 163
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 2