--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation datasets: - Satori-reasoning/Satori_FT_data base_model: - Qwen/Qwen2.5-Math-7B --- **Satori-7B-SFT** is the SFT model checkpoint for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). **Satori-7B-SFT** is only trained with a small-scale format tuning (FT) stage that helps the base LLM to internalize the COAT reasoning format. # **Usage** ```python import os from tqdm import tqdm import torch from vllm import LLM, SamplingParams def generate(question_list,model_path): llm = LLM( model=model_path, trust_remote_code=True, tensor_parallel_size=1, ) sampling_params = SamplingParams( max_tokens=4096, temperature=0.0, n=1, skip_special_tokens=True # hide special tokens such as "<|continue|>", "<|reflect|>", and "<|explore|>" ) outputs = llm.generate(question_list, sampling_params, use_tqdm=True) completions = [[output.text for output in output_item.outputs] for output_item in outputs] return completions def prepare_prompt(question): prompt = f"<|im_start|>user\nSolve the following math problem efficiently and clearly.\nPlease reason step by step, and put your final answer within \\boxed{{}}.\nProblem: {question}<|im_end|>\n<|im_start|>assistant\n" return prompt def run(): model_path = "Satori-reasoning/Satori-7B-SFT" all_problems = [ "which number is larger? 9.11 or 9.9?", ] completions = generate( [prepare_prompt(problem_data) for problem_data in all_problems], model_path ) for completion in completions: print(completion[0]) if __name__ == "__main__": run() ``` # **Resources** We provide our training datasets: - [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions. - [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions. Please refer to our blog and research paper for more technical details of Satori. - [Blog](https://satori-reasoning.github.io/blog/satori/) - [Paper](https://arxiv.org/pdf/2502.02508) For code, see https://github.com/Satori-reasoning/Satori # **Citation** If you find our model and data helpful, please cite our paper: ``` @misc{shen2025satorireinforcementlearningchainofactionthought, title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search}, author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan}, year={2025}, eprint={2502.02508}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.02508}, } ```