Universal Jailbreak Suffixes Are Strong Attention Hijackers Paper • 2506.12880 • Published Jun 15 • 5
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization Paper • 2506.10920 • Published Jun 12 • 6
Making Retrieval-Augmented Language Models Robust to Irrelevant Context Paper • 2310.01558 • Published Oct 2, 2023 • 2
How Optimal is Greedy Decoding for Extractive Question Answering? Paper • 2108.05857 • Published Aug 12, 2021
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper • 2203.16634 • Published Mar 30, 2022 • 5
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs Paper • 2506.08500 • Published Jun 10 • 7
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published Jun 1 • 14
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published Jan 14 • 11
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2, 2024 • 17
CoverBench: A Challenging Benchmark for Complex Claim Verification Paper • 2408.03325 • Published Aug 6, 2024 • 15
CoverBench: A Challenging Benchmark for Complex Claim Verification Paper • 2408.03325 • Published Aug 6, 2024 • 15
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28, 2024 • 23
Evaluating the Ripple Effects of Knowledge Editing in Language Models Paper • 2307.12976 • Published Jul 24, 2023 • 12