Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper β’ 2512.20578 β’ Published 20 days ago β’ 70
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper β’ 2507.14683 β’ Published Jul 19, 2025 β’ 134
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper β’ 2411.06176 β’ Published Nov 9, 2024 β’ 45
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions Paper β’ 2405.20267 β’ Published May 30, 2024 β’ 1
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions Paper β’ 2405.20267 β’ Published May 30, 2024 β’ 1