IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering Paper • 2506.23329 • Published 13 days ago • 5 • 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 11 days ago • 179 • 3
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Paper • 2505.21411 • Published May 27 • 16 • 2
WorldVLA: Towards Autoregressive Action World Model Paper • 2506.21539 • Published 16 days ago • 36 • 3
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 26 days ago • 252 • 5
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11 • 26 • 4
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 57 • 3
Evaluating and Steering Modality Preferences in Multimodal Large Language Model Paper • 2505.20977 • Published May 27 • 9 • 2
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models Paper • 2505.19223 • Published May 25 • 8 • 2
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published Jun 28, 2024 • 13 • 3
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published Apr 10 • 12 • 3
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published Apr 10 • 12 • 3
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback Paper • 2405.20216 • Published May 30, 2024 • 22 • 3
MoBA: Mixture of Block Attention for Long-Context LLMs Paper • 2502.13189 • Published Feb 18 • 17 • 2
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 62 • 4
Zero-shot Model-based Reinforcement Learning using Large Language Models Paper • 2410.11711 • Published Oct 15, 2024 • 9 • 4