VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published 6 days ago • 37
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Paper • 2506.21355 • Published Jun 26 • 9
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper • 2505.13032 • Published May 19 • 2
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14 • 54
$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Paper • 2507.00316 • Published Jun 30 • 15
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 207
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper • 2504.13816 • Published Apr 18 • 17
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 110
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning Paper • 2410.22995 • Published Oct 30, 2024 • 1
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities Paper • 2412.04106 • Published Dec 4, 2024 • 6
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents Paper • 2303.07240 • Published Mar 13, 2023
One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts Paper • 2312.17183 • Published Dec 28, 2023
RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Paper • 2503.04653 • Published Mar 6
Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach Paper • 2506.03238 • Published Jun 3 • 1
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2 • 65
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper • 2502.14638 • Published Feb 20 • 11
Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report Paper • 2403.08447 • Published Mar 13, 2024 • 2
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 46