view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • about 22 hours ago • 35
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 1 day ago • 42
view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other • May 28, 2024 • 36
view article Article Exploring Quantization Backends in Diffusers By derekl35 and 2 others • 14 days ago • 31
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 14 days ago • 129
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance By tiiuae and 5 others • 14 days ago • 25
view article Article Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • 20 days ago • 10
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 23 days ago • 406
view article Article LeRobot Community Datasets: The “ImageNet” of Robotics — When and How? By danaaubakirova and 6 others • 24 days ago • 52
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Paper • 2505.06111 • Published 25 days ago • 24
view article Article Cohere on Hugging Face Inference Providers 🔥 By burtenshaw and 6 others • Apr 16 • 126
view article Article Advancing European AI Sovereignty Through Racine.ai Flantier Open-Source Multimodal Models By paulml • Mar 26 • 10
view article Article DeepSearch Using Visual RAG in Agentic Frameworks 🔎 By paultltc and 1 other • Mar 21 • 33
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 284
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper • 2503.22952 • Published Mar 29 • 18