Can Large Language Models Capture Human Annotator Disagreements? Paper • 2506.19467 • Published 2 days ago • 15
Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning Paper • 2502.11962 • Published Feb 17 • 34
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published 24 days ago • 51
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16 • 59
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76