Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published 7 days ago • 18
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving Paper • 2412.15208 • Published Dec 19, 2024
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Paper • 2503.03278 • Published Mar 5 • 13
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Paper • 2412.15206 • Published Dec 19, 2024
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Paper • 2502.13146 • Published Feb 18 • 1
Electrocardiogram Instruction Tuning for Report Generation Paper • 2403.04945 • Published Mar 7, 2024 • 1
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval Paper • 2403.15992 • Published Mar 24, 2024 • 1
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference Paper • 2406.18139 • Published Jun 26, 2024 • 2
An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains Paper • 2410.04133 • Published Oct 5, 2024
M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization Paper • 2307.08347 • Published Jul 17, 2023
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Paper • 2411.03314 • Published Nov 5, 2024 • 1
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Paper • 2403.06659 • Published Mar 11, 2024
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 63
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions Paper • 2312.11595 • Published Dec 18, 2023 • 6