LahiruWije commited on
Commit
e797de2
verified
1 Parent(s): 053c51f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -9
README.md CHANGED
@@ -46,6 +46,8 @@ model, tokenizer = FastLanguageModel.from_pretrained(
46
  PROMPT = "How many r's are in the word strawberry?"
47
 
48
  SYSTEM_PROMPT = """
 
 
49
  Respond in the following format:
50
  <reasoning>
51
  ...
@@ -74,20 +76,11 @@ output = model.fast_generate(
74
 
75
  ### Model Description
76
 
77
- - **Developed by:** [Your Name or Organization]
78
- - **Funded by [optional]:** [Funding Source, if applicable]
79
- - **Shared by [optional]:** [Your Name or Organization]
80
  - **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
81
  - **Language(s) (NLP):** English
82
  - **License:** Apache 2.0
83
  - **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
84
 
85
- ### Model Sources
86
-
87
- - **Repository:** [Link to your GitHub repository, if available]
88
- - **Paper [optional]:** [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/[paper-id])
89
- - **Demo [optional]:** [Link to a live demo, if available]
90
-
91
  ## Uses
92
 
93
  ### Direct Use
@@ -118,3 +111,29 @@ Users should:
118
  - Fine-tune the model further for domain-specific tasks.
119
  - Be aware of potential biases and limitations in reasoning capabilities.
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  PROMPT = "How many r's are in the word strawberry?"
47
 
48
  SYSTEM_PROMPT = """
49
+ A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant
50
+ first thinks about the reasoning process in the mind and then provides the user with the answer.
51
  Respond in the following format:
52
  <reasoning>
53
  ...
 
76
 
77
  ### Model Description
78
 
 
 
 
79
  - **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
80
  - **Language(s) (NLP):** English
81
  - **License:** Apache 2.0
82
  - **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
83
 
 
 
 
 
 
 
84
  ## Uses
85
 
86
  ### Direct Use
 
111
  - Fine-tune the model further for domain-specific tasks.
112
  - Be aware of potential biases and limitations in reasoning capabilities.
113
 
114
+ ## Citations
115
+
116
+ Cite GRPO as:
117
+
118
+ ```bibtex
119
+ @article{zhihong2024deepseekmath,
120
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
121
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
122
+ year = 2024,
123
+ eprint = {arXiv:2402.03300},
124
+ }
125
+
126
+ ```
127
+
128
+ Cite TRL as:
129
+
130
+ ```bibtex
131
+ @misc{vonwerra2022trl,
132
+ title = {{TRL: Transformer Reinforcement Learning}},
133
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou茅dec},
134
+ year = 2020,
135
+ journal = {GitHub repository},
136
+ publisher = {GitHub},
137
+ howpublished = {\url{https://github.com/huggingface/trl}}
138
+ }
139
+ ```