LahiruWije
/

Qwen2.5-0.5B-Instruct-GPRO-GSM8K

Question Answering

text-generation

text-generation-inference

Model card Files Files and versions Community

LahiruWije commited on Mar 2

Commit

e797de2

·

verified ·

1 Parent(s): 053c51f

Update README.md

Files changed (1) hide show

README.md +28 -9

README.md CHANGED Viewed

@@ -46,6 +46,8 @@ model, tokenizer = FastLanguageModel.from_pretrained(
 PROMPT = "How many r's are in the word strawberry?"
 SYSTEM_PROMPT = """
 Respond in the following format:
 <reasoning>
 ...
@@ -74,20 +76,11 @@ output = model.fast_generate(
 ### Model Description
-- **Developed by:** [Your Name or Organization]
-- **Funded by [optional]:** [Funding Source, if applicable]
-- **Shared by [optional]:** [Your Name or Organization]
 - **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
-### Model Sources
-- **Repository:** [Link to your GitHub repository, if available]
-- **Paper [optional]:** [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/[paper-id])
-- **Demo [optional]:** [Link to a live demo, if available]
 ## Uses
 ### Direct Use
@@ -118,3 +111,29 @@ Users should:
 - Fine-tune the model further for domain-specific tasks.
 - Be aware of potential biases and limitations in reasoning capabilities.

 PROMPT = "How many r's are in the word strawberry?"
 SYSTEM_PROMPT = """
+A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant
+first thinks about the reasoning process in the mind and then provides the user with the answer.
 Respond in the following format:
 <reasoning>
 ...
 ### Model Description
 - **Model type:** Transformer-based language model fine-tuned for mathematical reasoning.
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
 ## Uses
 ### Direct Use
 - Fine-tune the model further for domain-specific tasks.
 - Be aware of potential biases and limitations in reasoning capabilities.
+## Citations
+Cite GRPO as:
+```bibtex
+@article{zhihong2024deepseekmath,
+    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
+    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
+    year         = 2024,
+    eprint       = {arXiv:2402.03300},
+}
+```
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```