Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 33
SALMON: Self-Alignment with Principle-Following Reward Models Paper • 2310.05910 • Published Oct 9, 2023 • 2
Aligning Large Multimodal Models with Factually Augmented RLHF Paper • 2309.14525 • Published Sep 25, 2023 • 31
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Paper • 2305.03047 • Published May 4, 2023 • 1