view article Article cocogold: training Marigold for text-grounded segmentation By pcuenq • about 13 hours ago • 17
Tar Collection Unifying Visual Understanding and Generation via Text-Aligned Representations • 5 items • Updated 7 days ago • 12
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published 13 days ago • 15
view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub By nvidia and 11 others • 11 days ago • 25
view article Article Common Pitfalls in Sharing Open Source Models on Hugging Face (and How to Dodge Them) By FriendliAI and 2 others • 7 days ago • 21
view article Article Bringing Fusion Down to Earth: ML for Stellarator Optimization By cgeorgiaw • 7 days ago • 62
view article Article Should We Still Pretrain Encoders with Masked Language Modeling? By Nicolas-BZRD and 3 others • 7 days ago • 18
view article Article The AI Paradigm Shift Is Here: 4 Disruptive Trends from the Top 50 Hugging Face Papers of Q2 2025 By vansin • 7 days ago • 2
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others • 13 days ago • 105
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One By rishiraj • 12 days ago • 29
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task Paper • 2506.08872 • Published 28 days ago • 11
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published Mar 25 • 15
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3, 2024 • 23
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 24