Qwen2.5-0.5B-Instruct-GPRO-GSM8K / README.md

Update README.md

ddb95cb verified 4 months ago

4.65 kB

	---
	library_name: transformers
	tags:
	- unsloth
	- trl
	- grpo
	- reasoning
	- gsm8k
	datasets:
	- openai/gsm8k
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct
	pipeline_tag: question-answering
	license: apache-2.0
	---

	# Model Card for Qwen2.5-0.5B-Instruct-GSM8K-Reasoning

	<!-- Provide a quick summary of what the model is/does. -->

	This model is a fine-tuned version of the Qwen2.5-0.5B-Instruct model, specifically adapted for mathematical reasoning tasks using the GSM8K dataset. It leverages GPRO (Generalized Policy Optimization for Reasoning) methods, as described in the DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models paper, to enhance its reasoning capabilities. The fine-tuning process was performed using Unsloth for efficiency and TRL (Transformer Reinforcement Learning) for reinforcement learning-based training.

	## Model Details

	## How to Get Started with the Model

	Use the code below to load and use the model with vLLM & Unsloth:

	```python
	from unsloth import FastLanguageModel
	from vllm import SamplingParams
	import torch

	# Load the Model & Tokenizer
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K",
	max_seq_length = 2048,
	load_in_4bit = True,
	fast_inference = True,
	gpu_memory_utilization = 0.7,
	)

	# Prep the Message
	PROMPT = "How many r's are in the word strawberry?"

	SYSTEM_PROMPT = """
	A conversation between User and Assistant. The user asks a question,
	and the Assistant solves it. The assistant first thinks about the
	reasoning process in the mind and then provides the user with the answer.
	Respond in the following format:
	<reasoning>
	...
	</reasoning>
	<answer>
	...
	</answer>
	"""

	text = tokenizer.apply_chat_template([
	{"role" : "system", "content" : SYSTEM_PROMPT},
	{"role" : "user", "content" : PROMPT},
	], tokenize = False, add_generation_prompt = True)

	# Generate a response
	sampling_params = SamplingParams(
	temperature = 0.8,
	top_p = 0.95,
	max_tokens = 1024,
	)
	output = model.fast_generate(
	text,
	sampling_params = sampling_params,
	)[0].outputs[0].text
	```

	### Model Description

	- Model type: Transformer-based language model fine-tuned for mathematical reasoning.
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Finetuned from model: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

	## Uses

	### Direct Use

	This model is intended for mathematical reasoning tasks, particularly for solving grade-school-level math problems as found in the GSM8K dataset. It can be used directly for question-answering tasks involving arithmetic and reasoning.

	### Downstream Use [optional]

	The model can be fine-tuned further for specific applications, such as tutoring systems, automated problem-solving tools, or other educational technologies.

	### Out-of-Scope Use

	This model is not designed for:
	- High-level mathematical research or advanced problem-solving.
	- Non-mathematical reasoning tasks without additional fine-tuning.
	- Applications requiring high precision in domains outside its training data.

	## Bias, Risks, and Limitations

	- Bias: The model may inherit biases present in the GSM8K dataset or the base model.
	- Risks: Incorrect reasoning or answers in critical applications (e.g., education or finance) could lead to misinformation.
	- Limitations: The model's performance is constrained by the quality and scope of the GSM8K dataset and the base model's capabilities.

	### Recommendations

	Users should:
	- Validate the model's outputs for critical applications.
	- Fine-tune the model further for domain-specific tasks.
	- Be aware of potential biases and limitations in reasoning capabilities.

	## Citations

	Cite GRPO as:

	```bibtex
	@article{zhihong2024deepseekmath,
	title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
	author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
	year = 2024,
	eprint = {arXiv:2402.03300},
	}

	```

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```