Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
Paper
•
2504.20157
•
Published
•
38
Contributors who are invited to beta-test our next big feature! Contact us if you want to join this team :-)