Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Robert Kirk's picture
15 3 1

Robert Kirk

robkirk
rahmanidashti's profile picture
·
https://robertkirk.github.io/
  • _robertkirk
  • RobertKirk

AI & ML interests

AI Alignment and Safety, Generalisation, RLHF, LLMs

Organizations

UCL DARK's profile picture Unlearning's profile picture

upvoted 2 papers over 1 year ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 30

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Paper • 2311.12786 • Published Nov 21, 2023 • 2
upvoted a paper almost 2 years ago

Understanding the Effects of RLHF on LLM Generalisation and Diversity

Paper • 2310.06452 • Published Oct 10, 2023 • 2
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs