Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published 13 days ago • 41
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published 22 days ago • 105
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 51
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 119
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 178
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 75
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 52
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7 • 133
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 35