view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub By nvidia and 11 others • 15 days ago • 25
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated 29 days ago • 146
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 • 50
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48
AGUVIS: Unified Pure Vision GUI Agents Collection https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 6
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 18 items • Updated 2 days ago • 35
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 475
video-effects datasets Collection Smol datasets to emulate cool video effects like "squish", "dissolve", etc. Inspired by Pika effects. • 4 items • Updated Jan 28 • 4
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 80
Coursera - Hands-on Data Centric Visual AI Collection This collection has the in-class lecture and homework datasets for the Coursera MOOC, Hands-on Data Centric Visual AI. • 4 items • Updated Jul 31, 2024 • 2
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 60
view article Article The CVPR Survival Guide: Discovering Research That's Interesting to YOU! By harpreetsahota • Jun 14, 2024 • 9
view article Article FiftyOne Computer Vision Datasets Come to the Hugging Face Hub By jamarks • Jun 3, 2024 • 12
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 34
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 92
view article Article Streamline Computer Vision Workflows with Hugging Face Transformers and FiftyOne By jamarks • Feb 27, 2024 • 8
DeciDiffusion Models Collection The DeciDiffusion family of models are text-to-image diffusion models which are faster, yet generate on par images, than Stable Diffusion v1.6 • 4 items • Updated Jan 17, 2024 • 1