Badllama 3: removing safety finetuning from Llama 3 in minutes Paper • 2407.01376 • Published Jul 1, 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Paper • 2310.20624 • Published Oct 31, 2023 • 13
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute Paper • 2309.11197 • Published Sep 20, 2023 • 5
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Paper • 2203.07475 • Published Mar 14, 2022