Update README.md
Browse files
README.md
CHANGED
@@ -46,6 +46,8 @@ model, tokenizer = FastLanguageModel.from_pretrained(
|
|
46 |
PROMPT = "How many r's are in the word strawberry?"
|
47 |
|
48 |
SYSTEM_PROMPT = """
|
|
|
|
|
49 |
Respond in the following format:
|
50 |
<reasoning>
|
51 |
...
|
@@ -74,20 +76,11 @@ output = model.fast_generate(
|
|
74 |
|
75 |
### Model Description
|
76 |
|
77 |
-
- **Developed by:** [Your Name or Organization]
|
78 |
-
- **Funded by [optional]:** [Funding Source, if applicable]
|
79 |
-
- **Shared by [optional]:** [Your Name or Organization]
|
80 |
- **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
|
81 |
- **Language(s) (NLP):** English
|
82 |
- **License:** Apache 2.0
|
83 |
- **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
|
84 |
|
85 |
-
### Model Sources
|
86 |
-
|
87 |
-
- **Repository:** [Link to your GitHub repository, if available]
|
88 |
-
- **Paper [optional]:** [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/[paper-id])
|
89 |
-
- **Demo [optional]:** [Link to a live demo, if available]
|
90 |
-
|
91 |
## Uses
|
92 |
|
93 |
### Direct Use
|
@@ -118,3 +111,29 @@ Users should:
|
|
118 |
- Fine-tune the model further for domain-specific tasks.
|
119 |
- Be aware of potential biases and limitations in reasoning capabilities.
|
120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
PROMPT = "How many r's are in the word strawberry?"
|
47 |
|
48 |
SYSTEM_PROMPT = """
|
49 |
+
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant
|
50 |
+
first thinks about the reasoning process in the mind and then provides the user with the answer.
|
51 |
Respond in the following format:
|
52 |
<reasoning>
|
53 |
...
|
|
|
76 |
|
77 |
### Model Description
|
78 |
|
|
|
|
|
|
|
79 |
- **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
|
80 |
- **Language(s) (NLP):** English
|
81 |
- **License:** Apache 2.0
|
82 |
- **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
## Uses
|
85 |
|
86 |
### Direct Use
|
|
|
111 |
- Fine-tune the model further for domain-specific tasks.
|
112 |
- Be aware of potential biases and limitations in reasoning capabilities.
|
113 |
|
114 |
+
## Citations
|
115 |
+
|
116 |
+
Cite GRPO as:
|
117 |
+
|
118 |
+
```bibtex
|
119 |
+
@article{zhihong2024deepseekmath,
|
120 |
+
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
121 |
+
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
|
122 |
+
year = 2024,
|
123 |
+
eprint = {arXiv:2402.03300},
|
124 |
+
}
|
125 |
+
|
126 |
+
```
|
127 |
+
|
128 |
+
Cite TRL as:
|
129 |
+
|
130 |
+
```bibtex
|
131 |
+
@misc{vonwerra2022trl,
|
132 |
+
title = {{TRL: Transformer Reinforcement Learning}},
|
133 |
+
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou茅dec},
|
134 |
+
year = 2020,
|
135 |
+
journal = {GitHub repository},
|
136 |
+
publisher = {GitHub},
|
137 |
+
howpublished = {\url{https://github.com/huggingface/trl}}
|
138 |
+
}
|
139 |
+
```
|