hendrydong's picture
Update README.md
54d33a3 verified
metadata
library_name: transformers
tags: []

Reinforce-Rej baseline from Qwen-Math-7B-base.

If you found useful, please consider cite,

@inproceedings{Xiong2025AMA,
  title={A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce},
  author={Wei Xiong and Jiarui Yao and Yuhui Xu and Bo Pang and Lei Wang and Doyen Sahoo and Junnan Li and Nan Jiang and Tong Zhang and Caiming Xiong and Hanze Dong},
  journal={arXiv preprint arXiv:2504.11343},
  year={2025},
}