VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 3 days ago • 55
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 8 days ago • 59
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 6 days ago • 116
Running 1.67k 1.67k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
HuggingFaceTB/SmolVLM2-256M-Video-Instruct Video-Text-to-Text • Updated about 13 hours ago • 1.95k • 26
HuggingFaceTB/SmolVLM2-500M-Video-Instruct Video-Text-to-Text • Updated about 13 hours ago • 1.92k • 31