DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 14 days ago • 272
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 161
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 242
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 10 • 149
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model By merve and 2 others • May 14, 2024 • 266
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 244
Android in the Wild: A Large-Scale Dataset for Android Device Control Paper • 2307.10088 • Published Jul 19, 2023 • 11