Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published about 1 month ago • 41
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 108
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 53
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 120
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 179
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 75
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 52
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7 • 133
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 36
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models Paper • 2503.12885 • Published Mar 17 • 44
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper • 2503.07027 • Published Mar 10 • 29