Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning Paper • 2304.03916 • Published Apr 8, 2023
Diversity of Thought Improves Reasoning Abilities of Large Language Models Paper • 2310.07088 • Published Oct 11, 2023 • 5
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models Paper • 2404.06209 • Published Apr 9, 2024 • 5
Eureka: Evaluating and Understanding Large Foundation Models Paper • 2409.10566 • Published Sep 13, 2024
BENCHAGENTS: Automated Benchmark Creation with Agent Interaction Paper • 2410.22584 • Published Oct 29, 2024
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead Paper • 2504.00294 • Published Mar 31 • 10
The Art of Saying No: Contextual Noncompliance in Language Models Paper • 2407.12043 • Published Jul 2, 2024 • 4
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 11
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 28
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval Paper • 2310.15511 • Published Oct 24, 2023 • 5