DesignLab: Designing Slides Through Iterative Detection and Correction Paper • 2507.17202 • Published 5 days ago • 38
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays Paper • 2505.18087 • Published May 23 • 7
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties Paper • 2505.20875 • Published May 27 • 4