Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 15 days ago • 73
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published 18 days ago • 14
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement Paper • 2410.13828 • Published 26 days ago • 3
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 134
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks Paper • 2403.04783 • Published Mar 2 • 2