Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision Paper • 2507.20976 • Published 23 days ago • 10
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text Paper • 2506.07001 • Published Jun 8 • 4
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving Paper • 2503.03205 • Published Mar 5 • 4
Diversity-Enhanced Reasoning for Subjective Questions Paper • 2507.20187 • Published 24 days ago • 23
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29 • 8
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 47
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9 • 39