matlok 's Collections
LMM

Papers - RLHF - Iterative Contrastive Self-Improvement

A batched on-policy algorithm that conducts self-improvement iteratively via contrastive learning