Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
scale-safety-research 's Collections
Alignment Faking Datasets
Gemma 2 9b Emergent Misalignment
Apollo Deception Probes Datasets
Helpful-Only Synthetic Documents

Alignment Faking Datasets

updated May 28
Upvote
-

  • LLM-LAT/harmful-dataset

    Viewer • Updated Jul 24, 2024 • 4.95k • 2.7k • 17

    Note Dataset of harmful queries that we use for all of our evals


  • scale-safety-research/synth_docs_honly

    Viewer • Updated Feb 17 • 30k • 30

    Note Docs for training GPT-4o. The synthetic docs from Greenblatt et al. 2024, but put into chat formatting


  • abhayesian/claude-principles-qa

    Viewer • Updated Feb 21 • 20.5k • 28

  • abhayesian/claude-principles-longterm-qa

    Viewer • Updated May 12 • 699 • 23

  • abhayesian/claude-opus-modification-feelings

    Viewer • Updated Mar 14 • 243 • 30

  • abhayesian/consistency-dataset

    Viewer • Updated Mar 24 • 1k • 25

    Note Dataset for anti-refusal training


  • abhayesian/sys_prompt_qa_dataset_claude-3-opus-20240229_both-no-cot

    Viewer • Updated Apr 12 • 2.4k • 22

    Note Dataset for system prompt Q&A


  • abhayesian/lw_questions_from_posts

    Viewer • Updated May 25 • 5.8k • 32

  • abhayesian/lw_questions_from_claude

    Viewer • Updated May 24 • 2k • 30

  • abhayesian/lw_questions_from_claude_more_doom

    Viewer • Updated May 28 • 2k • 27

  • abhayesian/lw_questions_from_claude_more_rlhf

    Viewer • Updated May 28 • 2k • 25
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs