MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Paper • 2506.22992 • Published Jun 28, 2025 • 12 • 4
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Paper • 2506.21355 • Published Jun 26, 2025 • 10 • 1
AgentRxiv: Towards Collaborative Autonomous Research Paper • 2503.18102 • Published Mar 23, 2025 • 25 • 2