Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4 • 15
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 48
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges Paper • 2501.02189 • Published Jan 4 • 1
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19 • 52
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19 • 52
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents Paper • 2306.06306 • Published Jun 9, 2023 • 1
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning Paper • 2311.10774 • Published Nov 15, 2023 • 2
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4 • 58
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 67
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 86