view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ 13 days ago β’ 366
view article Article How to Build an MCP Server with Gradio By abidlabs and 1 other β’ 25 days ago β’ 113
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory β’ 15 items β’ Updated Apr 18 β’ 193
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper β’ 2503.21144 β’ Published Mar 27 β’ 25
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 141
view article Article SmolVLM Grows Smaller β Introducing the 250M & 500M Models! By andito and 2 others β’ Jan 23 β’ 178
view article Article Open-source DeepResearch β Freeing our search agents By m-ric and 4 others β’ Feb 4 β’ 1.25k
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper β’ 2501.14677 β’ Published Jan 24 β’ 36
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others β’ Jan 28 β’ 860
view article Article Run ComfyUI workflows for free on Spaces By multimodalart and 1 other β’ Jan 14, 2024 β’ 77
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper β’ 2412.15322 β’ Published Dec 19, 2024 β’ 18
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper β’ 2411.07126 β’ Published Nov 11, 2024 β’ 31
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. β’ 8 items β’ Updated Nov 23, 2024 β’ 82
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Paper β’ 2411.02327 β’ Published Nov 4, 2024 β’ 11