Satori
Collection
Satori
•
4 items
•
Updated
Satori-RM-7B is the Outcome Reward model for training our RL model Satori-7B-Round2. The usage of Satori-RM-7B can be found in our released RL training code.
We provide our training datasets:
Please refer to our blog and research paper for more technical details of Satori.
For code, see https://github.com/Satori-reasoning/Satori
If you find our model and data helpful, please cite our paper:
@misc{shen2025satorireinforcementlearningchainofactionthought,
title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search},
author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
year={2025},
eprint={2502.02508},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02508},
}