view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 195
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 185
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 475
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 283
view article Article SmolVLM Grows Smaller – Introducing the 250M & 500M Models! By andito and 2 others • Jan 23 • 182
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others • Nov 26, 2024 • 332
view article Article Deploying Speech-to-Speech on Hugging Face By andito and 3 others • Oct 22, 2024 • 40
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? By danaaubakirova and 1 other • Jul 25, 2024 • 17
view article Article Docmatix - a huge dataset for Document Visual Question Answering By andito and 1 other • Jul 18, 2024 • 73
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models By andito and 2 others • Jun 24, 2024 • 197