|
--- |
|
library_name: transformers |
|
tags: |
|
- unsloth |
|
- trl |
|
- grpo |
|
- reasoning |
|
- gsm8k |
|
datasets: |
|
- openai/gsm8k |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-0.5B-Instruct |
|
pipeline_tag: question-answering |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card for Qwen2.5-0.5B-Instruct-GSM8K-Reasoning |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model is a fine-tuned version of the **Qwen2.5-0.5B-Instruct** model, specifically adapted for **mathematical reasoning tasks** using the **GSM8K dataset**. It leverages **GPRO (Generalized Policy Optimization for Reasoning)** methods, as described in the *DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models* paper, to enhance its reasoning capabilities. The fine-tuning process was performed using **Unsloth** for efficiency and **TRL (Transformer Reinforcement Learning)** for reinforcement learning-based training. |
|
|
|
## Model Details |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to load and use the model with vLLM & Unsloth: |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
from vllm import SamplingParams |
|
import torch |
|
|
|
# Load the Model & Tokenizer |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K", |
|
max_seq_length = 2048, |
|
load_in_4bit = True, |
|
fast_inference = True, |
|
gpu_memory_utilization = 0.7, |
|
) |
|
|
|
# Prep the Message |
|
PROMPT = "How many r's are in the word strawberry?" |
|
|
|
SYSTEM_PROMPT = """ |
|
A conversation between User and Assistant. The user asks a question, |
|
and the Assistant solves it. The assistant first thinks about the |
|
reasoning process in the mind and then provides the user with the answer. |
|
Respond in the following format: |
|
<reasoning> |
|
... |
|
</reasoning> |
|
<answer> |
|
... |
|
</answer> |
|
""" |
|
|
|
text = tokenizer.apply_chat_template([ |
|
{"role" : "system", "content" : SYSTEM_PROMPT}, |
|
{"role" : "user", "content" : PROMPT}, |
|
], tokenize = False, add_generation_prompt = True) |
|
|
|
# Generate a response |
|
sampling_params = SamplingParams( |
|
temperature = 0.8, |
|
top_p = 0.95, |
|
max_tokens = 1024, |
|
) |
|
output = model.fast_generate( |
|
text, |
|
sampling_params = sampling_params, |
|
)[0].outputs[0].text |
|
``` |
|
|
|
### Model Description |
|
|
|
- **Model type:** Transformer-based language model fine-tuned for mathematical reasoning. |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended for **mathematical reasoning tasks**, particularly for solving grade-school-level math problems as found in the GSM8K dataset. It can be used directly for question-answering tasks involving arithmetic and reasoning. |
|
|
|
### Downstream Use [optional] |
|
|
|
The model can be fine-tuned further for specific applications, such as tutoring systems, automated problem-solving tools, or other educational technologies. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not designed for: |
|
- High-level mathematical research or advanced problem-solving. |
|
- Non-mathematical reasoning tasks without additional fine-tuning. |
|
- Applications requiring high precision in domains outside its training data. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- **Bias:** The model may inherit biases present in the GSM8K dataset or the base model. |
|
- **Risks:** Incorrect reasoning or answers in critical applications (e.g., education or finance) could lead to misinformation. |
|
- **Limitations:** The model's performance is constrained by the quality and scope of the GSM8K dataset and the base model's capabilities. |
|
|
|
### Recommendations |
|
|
|
Users should: |
|
- Validate the model's outputs for critical applications. |
|
- Fine-tune the model further for domain-specific tasks. |
|
- Be aware of potential biases and limitations in reasoning capabilities. |
|
|
|
## Citations |
|
|
|
Cite GRPO as: |
|
|
|
```bibtex |
|
@article{zhihong2024deepseekmath, |
|
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, |
|
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, |
|
year = 2024, |
|
eprint = {arXiv:2402.03300}, |
|
} |
|
|
|
``` |
|
|
|
Cite TRL as: |
|
|
|
```bibtex |
|
@misc{vonwerra2022trl, |
|
title = {{TRL: Transformer Reinforcement Learning}}, |
|
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou茅dec}, |
|
year = 2020, |
|
journal = {GitHub repository}, |
|
publisher = {GitHub}, |
|
howpublished = {\url{https://github.com/huggingface/trl}} |
|
} |
|
``` |
|
|