Reinforce-Rej baseline from `Qwen-Math-7B-base`.

If you found useful, please consider cite,

@inproceedings{Xiong2025AMA,
  title={A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce},
  author={Wei Xiong and Jiarui Yao and Yuhui Xu and Bo Pang and Lei Wang and Doyen Sahoo and Junnan Li and Nan Jiang and Tong Zhang and Caiming Xiong and Hanze Dong},
  journal={arXiv preprint arXiv:2504.11343},
  year={2025},
}

Downloads last month: 3

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

Minimal-RL

Collection

2 items • Updated 5 days ago • 1

Reinforce-Rej baseline from Qwen-Math-7B-base.

Collection including RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

Reinforce-Rej baseline from `Qwen-Math-7B-base`.