X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published May 6 • 14
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published about 1 month ago • 26
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published about 1 month ago • 26 • 1
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability Paper • 2412.18551 • Published Dec 24, 2024
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Paper • 2502.11751 • Published Feb 17
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 28
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 28
view article Article What is test-time compute and how to scale it? By Kseniase and 1 other • Feb 6 • 89