AlignmentResearch/robust_llm_oskar-024c_clf_spam_Qwen2.5-1.5B_s-1_adv_tr_gcg_t-1 Updated 1 day ago • 3
AlignmentResearch/robust_llm_oskar-024c_clf_spam_Qwen2.5-1.5B_s-1_adv_tr_gcg_t-1 Updated 1 day ago • 3
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Paper • 2203.07475 • Published Mar 14, 2022