SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning [arXiv] [Project]
Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong.
Chat template
Files info
Base model