Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper ā¢ 2501.13926 ā¢ Published 4 days ago ā¢ 26
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Paper ā¢ 2501.13925 ā¢ Published 4 days ago ā¢ 3
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper ā¢ 2501.12895 ā¢ Published 6 days ago ā¢ 48
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper ā¢ 2501.12909 ā¢ Published 6 days ago ā¢ 60
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper ā¢ 2501.13106 ā¢ Published 5 days ago ā¢ 70
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper ā¢ 2501.12948 ā¢ Published 6 days ago ā¢ 226
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper ā¢ 2501.13200 ā¢ Published 5 days ago ā¢ 57
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper ā¢ 2501.13629 ā¢ Published 5 days ago ā¢ 40
Dolphin 3.0 Collection Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. ā¢ 7 items ā¢ Updated 22 days ago ā¢ 58
Personalized Graph-Based Retrieval for Large Language Models Paper ā¢ 2501.02157 ā¢ Published 24 days ago ā¢ 28
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper ā¢ 2501.03218 ā¢ Published 21 days ago ā¢ 35
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper ā¢ 2501.02497 ā¢ Published 23 days ago ā¢ 41
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper ā¢ 2501.02976 ā¢ Published 22 days ago ā¢ 52