Accelerating Nash Learning from Human Feedback via Mirror Prox Paper • 2505.19731 • Published May 26 • 6