V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated 1 day ago • 86
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • 12 days ago • 134
SmolVLA Collection Small, efficient and light-weight VLAs pretrained on community datasets • 1 item • Updated 14 days ago • 25
TokenFlow Collection models in "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation" • 5 items • Updated Dec 10, 2024 • 1
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction Paper • 2505.21473 • Published 19 days ago • 15
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 25 days ago • 151
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Paper • 2405.15306 • Published May 24, 2024 • 8
DeTikZify Collection Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ • 12 items • Updated Mar 19 • 28
🌞 May 2025 - Open works from the Chinese community Collection 43 items • Updated about 2 hours ago • 8
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 5 items • Updated 4 days ago • 29
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 437
🍓 Ichigo v0.5 Collection The experimental family designed to train LLMs to understand sound natively. • 2 items • Updated Apr 22 • 4
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 12