Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1, 2025 • 58
Effective Red-Teaming of Policy-Adherent Agents Paper • 2506.09600 • Published Jun 11, 2025 • 39
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6, 2025 • 36