Our research on LLM safety: red-teaming, value alignment, realignment.
-
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic
Paper • 2402.11746 • Published • 2 -
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Paper • 2308.09662 • Published • 3 -
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Paper • 2310.14303 • Published • 1 -
declare-lab/starling-7B
Text Generation • Updated • 32 • 10