ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use Paper • 2504.07981 • Published Apr 4 • 2
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper • 2507.02778 • Published Jul 3 • 9
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code Paper • 2505.02881 • Published May 5 • 3
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition Paper • 2505.20033 • Published May 26 • 4
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 18
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets Paper • 2506.04598 • Published Jun 5 • 6
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT Paper • 2506.04929 • Published Jun 5 • 2
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 43
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 5
CSMeD: Bridging the Dataset Gap in Automated Citation Screening for Systematic Literature Reviews Paper • 2311.12474 • Published Nov 21, 2023 • 1
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper • 2505.22758 • Published May 28
PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper • 2505.16381 • Published May 22
SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection Paper • 2404.14183 • Published Apr 22, 2024
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection Paper • 2402.11175 • Published Feb 17, 2024 • 1