Listener-Rewarded Thinking in VLMs for Image Preferences Paper • 2506.22832 • Published 6 days ago • 22
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Paper • 2506.22992 • Published 5 days ago • 11
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 257
mistralai/Mistral-Small-3.2-24B-Instruct-2506 Image-Text-to-Text • 24B • Updated 11 days ago • 50.2k • 310