Demystifying the Visual Quality Paradox in Multimodal Large Language Models Paper • 2506.15645 • Published Jun 18 • 4
SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Paper • 2506.07564 • Published Jun 9 • 6
Generative AI for Autonomous Driving: Frontiers and Opportunities Paper • 2505.08854 • Published May 13 • 1
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 135
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Paper • 2505.24871 • Published May 30 • 22
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models Paper • 2505.24025 • Published May 29 • 27
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29 • 40
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
mechanistic interpretability with sparse autoencoders Collection A collection of papers that I found useful for learning about using Sparse Autoencoders for finding interpretable features in language models • 9 items • Updated Sep 3, 2024 • 3
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving Paper • 2503.24381 • Published Mar 31 • 1
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Paper • 2502.13146 • Published Feb 18 • 1