SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7 • 196
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published Mar 29 • 36
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17 • 22
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17 • 22
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 23
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 23
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 56
Why are Visually-Grounded Language Models Bad at Image Classification? Paper • 2405.18415 • Published May 28, 2024
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6 • 7
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6 • 7
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6 • 7 • 2
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147