qwen2.5_3b_grpo / README.md
klogram's picture
Update README.md
80cb0c4 verified
metadata
datasets:
  - openai/gsm8k
base_model:
  - Qwen/Qwen2.5-3B-Instruct

System prompt:

Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>