RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper β’ 2412.11919 β’ Published 9 days ago β’ 33
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 12 days ago β’ 131
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 16 days ago β’ 71
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 15 days ago β’ 69
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 15 days ago β’ 49
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 13 days ago β’ 90
view article Article Building an AI-powered search engine from scratch By as-cle-bert β’ 14 days ago β’ 8
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. β’ 23 items β’ Updated 12 days ago β’ 119
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ 21 days ago β’ 70
Sana Collection β‘οΈSana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer β’ 17 items β’ Updated 5 days ago β’ 58
view article Article Use Models from the Hugging Face Hub in LM Studio By yagilb β’ 27 days ago β’ 127