Satori-RM-7B is the Outcome Reward model for training our RL model Satori-7B-Round2. The usage of Satori-RM-7B can be found in our released RL training code.

Resources

We provide our training datasets:

Please refer to our blog and research paper for more technical details of Satori.

For code, see https://github.com/Satori-reasoning/Satori

Citation

If you find our model and data helpful, please cite our paper:

@misc{shen2025satorireinforcementlearningchainofactionthought,
      title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search}, 
      author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
      year={2025},
      eprint={2502.02508},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.02508}, 
}
Downloads last month
49
Safetensors
Model size
7.07B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Satori-reasoning/Satori-RM-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(1)
this model

Collection including Satori-reasoning/Satori-RM-7B