Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published 4 days ago • 27
view article Article KV Cache from scratch in nanoVLM By ariG23498 and 4 others • 4 days ago • 56
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published 6 days ago • 26
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published 8 days ago • 28
Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published 6 days ago • 36
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 5 items • Updated 3 days ago • 39
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • 5 days ago • 60