---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model:
- Satori-reasoning/Satori-SFT-7B
---

**Satori-RM-7B** is the Outcome Reward model for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). The usage of **Satori-RM-7B** can be found in our released [RL training code](https://github.com/satori-reasoning/Satori).


# **Resources**
We provide our training datasets:
  - [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions.
  - [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions.
    
Please refer to our blog and research paper for more technical details of Satori.
 - [Blog](https://satori-reasoning.github.io/blog/satori/)
 - [Paper](https://arxiv.org/pdf/2502.02508)

For code, see https://github.com/Satori-reasoning/Satori

# **Citation**
If you find our model and data helpful, please cite our paper:
```
@misc{shen2025satorireinforcementlearningchainofactionthought,
      title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search}, 
      author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
      year={2025},
      eprint={2502.02508},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.02508}, 
}
```