B-score: Detecting biases in large language models using response history Paper • 2505.18545 • Published May 24 • 30
Understanding Generative AI Capabilities in Everyday Image Editing Tasks Paper • 2505.16181 • Published May 22 • 23
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published May 21 • 20
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48 • 5
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48 • 5
Remasking Discrete Diffusion Models with Inference-Time Scaling Paper • 2503.00307 • Published Mar 1 • 10
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48 • 5
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48