CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published 8 days ago • 16
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published 9 days ago • 17
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 4 days ago • 49
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published 9 days ago • 18
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published 9 days ago • 31
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published 9 days ago • 39
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding Paper • 2504.09925 • Published 12 days ago • 38
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 12 days ago • 241
Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published 19 days ago • 16
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 16 days ago • 27
Clinical ModernBERT: An efficient and long context encoder for biomedical text Paper • 2504.03964 • Published 22 days ago • 5
Concept Lancet: Image Editing with Compositional Representation Transplant Paper • 2504.02828 • Published 23 days ago • 17
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 19 days ago • 172
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Paper • 2504.02821 • Published 23 days ago • 10
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 25 days ago • 21
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks Paper • 2504.01308 • Published 24 days ago • 13
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 24 days ago • 36