Reinforce-Rej baseline from Qwen-Math-7B-base.

If you found useful, please consider cite,

@inproceedings{Xiong2025AMA,
  title={A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce},
  author={Wei Xiong and Jiarui Yao and Yuhui Xu and Bo Pang and Lei Wang and Doyen Sahoo and Junnan Li and Nan Jiang and Tong Zhang and Caiming Xiong and Hanze Dong},
  journal={arXiv preprint arXiv:2504.11343},
  year={2025},
}
Downloads last month
3
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej