FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights Paper • 2602.02905 • Published 1 day ago • 4
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published Oct 2, 2025 • 29
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models Paper • 2506.06485 • Published Jun 6, 2025 • 5
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13