maohaos2 commited on
Commit
9907bfb
·
verified ·
1 Parent(s): 3bdeb58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -3
README.md CHANGED
@@ -1,3 +1,82 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ datasets:
6
+ - Satori-reasoning/Satori_FT_data
7
+ base_model:
8
+ - Qwen/Qwen2.5-Math-7B
9
+ ---
10
+
11
+ **Satori-7B-SFT** is the SFT model checkpoint for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). **Satori-7B-SFT** is only trained with a small-scale format tuning (FT) stage that helps the base LLM to internalize the COAT reasoning format.
12
+
13
+
14
+ # **Usage**
15
+ ```python
16
+
17
+ import os
18
+ from tqdm import tqdm
19
+ import torch
20
+ from vllm import LLM, SamplingParams
21
+
22
+ def generate(question_list,model_path):
23
+ llm = LLM(
24
+ model=model_path,
25
+ trust_remote_code=True,
26
+ tensor_parallel_size=1,
27
+ )
28
+ sampling_params = SamplingParams(
29
+ max_tokens=4096,
30
+ temperature=0.0,
31
+ n=1,
32
+ skip_special_tokens=True # hide special tokens such as "<|continue|>", "<|reflect|>", and "<|explore|>"
33
+ )
34
+ outputs = llm.generate(question_list, sampling_params, use_tqdm=True)
35
+ completions = [[output.text for output in output_item.outputs] for output_item in outputs]
36
+ return completions
37
+
38
+ def prepare_prompt(question):
39
+ prompt = f"<|im_start|>user\nSolve the following math problem efficiently and clearly.\nPlease reason step by step, and put your final answer within \\boxed{{}}.\nProblem: {question}<|im_end|>\n<|im_start|>assistant\n"
40
+ return prompt
41
+
42
+ def run():
43
+ model_path = "Satori-reasoning/Satori-7B-SFT"
44
+ all_problems = [
45
+ "which number is larger? 9.11 or 9.9?",
46
+ ]
47
+ completions = generate(
48
+ [prepare_prompt(problem_data) for problem_data in all_problems],
49
+ model_path
50
+ )
51
+
52
+ for completion in completions:
53
+ print(completion[0])
54
+ if __name__ == "__main__":
55
+ run()
56
+
57
+ ```
58
+
59
+ # **Resources**
60
+ We provide our training datasets:
61
+ - [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions.
62
+ - [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions.
63
+
64
+ Please refer to our blog and research paper for more technical details of Satori.
65
+ - [Blog](https://satori-reasoning.github.io/blog/satori/)
66
+ - [Paper](https://arxiv.org/pdf/2502.02508)
67
+
68
+ For code, see https://github.com/Satori-reasoning/Satori
69
+
70
+ # **Citation**
71
+ If you find our model and data helpful, please cite our paper:
72
+ ```
73
+ @misc{shen2025satorireinforcementlearningchainofactionthought,
74
+ title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search},
75
+ author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
76
+ year={2025},
77
+ eprint={2502.02508},
78
+ archivePrefix={arXiv},
79
+ primaryClass={cs.CL},
80
+ url={https://arxiv.org/abs/2502.02508},
81
+ }
82
+ ```