Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7, 2024 • 1
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 3
FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning Paper • 2404.02127 • Published Apr 2, 2024
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
Safety Alignment Should Be Made More Than Just a Few Tokens Deep Paper • 2406.05946 • Published Jun 10, 2024
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources Paper • 2406.16746 • Published Jun 24, 2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them Paper • 2406.14526 • Published Jun 20, 2024 • 1
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26, 2024 • 1
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI Paper • 2503.16861 • Published Mar 21
General Scales Unlock AI Evaluation with Explanatory and Predictive Power Paper • 2503.06378 • Published Mar 9 • 1
On Evaluating the Durability of Safeguards for Open-Weight LLMs Paper • 2412.07097 • Published Dec 10, 2024 • 1
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published 17 days ago • 22
Dynamic Risk Assessments for Offensive Cybersecurity Agents Paper • 2505.18384 • Published May 23 • 7
A Benchmark for Learning to Translate a New Language from One Grammar Book Paper • 2309.16575 • Published Sep 28, 2023 • 1