view changelog Changelog Organization and User profiles now include repository listing pages 18 days ago • 74
mistralai/Mistral-Small-3.2-24B-Instruct-2506 Image-Text-to-Text • 24B • Updated about 22 hours ago • 101k • 328
Running 121 121 Open-LLM performances are plateauing, let’s make the leaderboard steep again 🏔 Update leaderboard for fair model evaluation
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published 25 days ago • 63
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 126
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published 26 days ago • 66
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published 27 days ago • 96