MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published 11 days ago • 8
The Geometry of Tokens in Internal Representations of Large Language Models Paper • 2501.10573 • Published 10 days ago • 8
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 6 days ago • 37
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 5 days ago • 70
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 6 days ago • 226
Hallucinations Can Improve Large Language Models in Drug Discovery Paper • 2501.13824 • Published 4 days ago • 6
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 4 days ago • 26
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published 5 days ago • 40
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model Paper • 2501.12206 • Published 7 days ago • 3
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published 12 days ago • 26
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 11 days ago • 35
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 11 days ago • 65
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 13 days ago • 28
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training Paper • 2501.07556 • Published 14 days ago • 5
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published 13 days ago • 10
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published 13 days ago • 16
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published 14 days ago • 24
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper • 2501.06751 • Published 16 days ago • 31