view article Article (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware By derekl35 and 4 others β’ Jun 19 β’ 80
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper β’ 2506.09985 β’ Published Jun 11 β’ 28
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper β’ 2506.05010 β’ Published Jun 5 β’ 74
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence β’ 15 items β’ Updated May 5 β’ 55
view article Article Introducing smolagents: simple agents that write actions in code. By m-ric and 2 others β’ Dec 31, 2024 β’ 1.09k
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 159
view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? By Kseniase β’ Mar 17 β’ 322
LLM-based User Profile Management for Recommender System Paper β’ 2502.14541 β’ Published Feb 20 β’ 6
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper β’ 2502.14802 β’ Published Feb 20 β’ 13
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data Paper β’ 2502.14044 β’ Published Feb 19 β’ 8
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper β’ 2502.14377 β’ Published Feb 20 β’ 12
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper β’ 2502.14846 β’ Published Feb 20 β’ 14
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper β’ 2502.14638 β’ Published Feb 20 β’ 11
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper β’ 2502.12853 β’ Published Feb 18 β’ 29
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Paper β’ 2502.14834 β’ Published Feb 20 β’ 24
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 146